=Paper=
{{Paper
|id=Vol-2699/paper11
|storemode=property
|title=Leveraging a Domain Ontology in (Neural) Learning from Heterogeneous Data
|pdfUrl=https://ceur-ws.org/Vol-2699/paper11.pdf
|volume=Vol-2699
|authors=Tomas Martin,Petko Valtchev,Abdoulaye Baniré Diallo
|dblpUrl=https://dblp.org/rec/conf/cikm/MartinVD20
}}
==Leveraging a Domain Ontology in (Neural) Learning from Heterogeneous Data==
<pdf width="1500px">https://ceur-ws.org/Vol-2699/paper11.pdf</pdf>
<pre>
Leveraging a domain ontology in (neural) learning from
heterogeneous data
Tomas Martina , Petko Valtcheva and Abdoulaye Baniré Dialloa
a Centre de Recherche en Intelligence Artificielle (CRIA), UQAM, Montreal, Canada


                                          Abstract
                                          Injecting domain knowledge into a neural learner to alleviate reliance on high-quality data and improve explainability is a
                                          rapidly expanding research trend. While most of the effort focused on regular topology formats such as sequences and grids,
                                          we consider graph datasets. Moreover, instead of knowledge graph (KG) embedding that underlies the majority of graph-
                                          centered methods, we propose a dedicated pattern mining-based approach. As our patterns are ontologically-generalized,
                                          they achieve multiple objectives: domain knowledge infusion, generalization capacity enhancement, interpretability, etc.

                                          Keywords
                                          Domain Ontology, Symbolic methods, Sub-symbolic methods, Neural networks, Graph pattern mining


1. Introduction                                                                                                    2. Motivation
Nowadays, implementing decision support systems to                                                                 Daily activities in agro-industrial sector, e.g. a mainte-
help practitioners in complex activities has become a                                                              nance of a dairy farm, like those in other areas related
current practice in many fields. Many of these sys-                                                                to life sciences, generate large amounts of data. The
tems, traditionally, have used machine learning to pre-                                                            underlying data sources reflect complementary aspects
dict the outcome of a specific problem in the user’s en-                                                           such as farm yield, environment, animal health, ge-
vironment and use the prediction to suggest concrete                                                               netics, etc. The recent trend of precision(-based) agri-
actions. Deep learning has arrived with a promise to                                                               culture looks at exploiting this data to support the de-
expand the areas where automation is successfully ap-                                                              cision making of domain stake-holders [3]: farmers,
plied in problem solving, hence the expectation for                                                                agronomists, dairy companies, insurers, etc.
high-quality decision support to profuse.                                                                             Yet, in order to be effective, any recommendation
   However, predicting or learning representations on                                                              will have to reflect existing practices and, more gener-
such complex domains typically requires the availabil-                                                             ally, at least partly reflect the general knowledge from
ity of large amounts of data of sufficiently high qual-                                                            the domain. For instance, at the end of each lactation a
ity. Unfortunately, in practice, such datasets are not                                                             cow gets dry for a while. Yet there is no a straightfor-
always readily available. Conversely, often quantities                                                             ward way to train a neural model on milk yield data:
of machine-readable expert knowledge do exist, and                                                                 The ensuing abrupt drop in milk yield is hard to digest
could potentially complement already available data.                                                               for, at least, the most popular deep learning architec-
Since they reflect at least partly the expertise that un-                                                          tures [4]. Indeed, these models do not seem to properly
derlies decision making in the field, it is only natu-                                                             grasp the dynamics in a cow life-cycle, e.g. lactation,
ral to look for ways to inject that knowledge into the                                                             calvings, drying, etc.
learning process to try to guide it and compensate the                                                                While there are still work-arounds left to explore,
scarceness of high-quality data.                                                                                   one legitimate research question is whether injecting
   For several decades, ontologies, i.e. structured rep-                                                           some domain knowledge would help here. In a broader
resentation of domain concepts and their relations [1],                                                            approach, we investigate the impact of feeding com-
have been promoted as the appropriate tool for mak-                                                                plementary data, e.g. on genetics and animal health,
ing domain knowledge available for machine process-                                                                and organizing the overall dataset under a domain on-
ing [2].                                                                                                           tology (DO) providing additional descriptive knowl-
                                                                                                                   edge.
                                                                                                                      While supplementing a neural learner with domain
Proceedings of the CIKM 2020 Workshops, October 19-20, Galway,
Ireland.                                                                                                           knowledge stemming from an ontology is definitely
email: martin.tomas@courrier.uqam.ca (T. Martin);                                                                  appealing, it is also a challenging task, mainly due to
valtchev.petko@uqam.ca (P. Valtchev); diallo.abdoulaye@uqam.ca                                                     the “impedance mismatch”, i.e. the divergence in the
(A.B. Diallo)
                                    © 2020 Copyright for this paper by its authors. Use permitted under Creative   respective levels of knowledge expression and manip-
 CEUR
 Workshop
 Proceedings
               http://ceur-ws.org
               ISSN 1613-0073
                                    Commons License Attribution 4.0 International (CC BY 4.0).
                                    CEUR Workshop Proceedings (CEUR-WS.org)                                        ulation [5].
3. Current State Of The Art                               able to DOs as in the case of KGs amalgamation is
                                                          favoured by them being on the same abstraction level
While symbolic representations, as a way to capture as the training data. In contrast, classes and proper-
knowledge, have clearly dominated the AI field since ties from a DO represent abstractions, i.e. sets of data
its inception, recently sub-symbolic ones –in the form objects and object-to-object links, hence the apparent
of trained neural networks– have rapidly gained in mismatch with the instance-centered modus operandi
popularity and use [6]. By trading discrete and man- of an ANN. Yet given the strive for (proper) general-
made (i.e. modelling) entities of the former for more ization in ANNs, the ontological structure, with its ca-
machine-made (artificial) and loosely defined “patterns”, pacity to generalize along expert-validated conceptual
the later breaks free of prior knowledge in order to, hierarchies (and property ones, for that matters), is a
arguably, benefit for a more powerful yet difficult to natural ally.
interpret representation language. At its core, infor-      Nonetheless, a few studies have tackled the exploita-
mation is distilled throughout a network as a set of tion of generic knowledge from a DO in neural learn-
waves (or pulses) representing captured knowledge.        ing. For instance, in [13], the authors exploit a DO
   In a broader scope, injecting domain knowledge in (a topic hierarchy, in actuality) of sound events to en-
a machine learning process has been extensively re- hance a neural classifier. They propose to replicate the
searched and proven helpful in many practical situa- hierarchical structure of the DO in the ANN topology
tions [7]. More recently, since deep learning has moved by: (1) allotting a layer per level in the is-a hierar-
centre stage, the focus has shifted on making neural chy and (2) enforcing fixed distance values between
networks collaborate with symbolic knowledge sources, pairs of example embeddings, which roughly translate
mostly knowledge graphs (KG) and, somewhat more the examples’ topological distance within the hierar-
modestly, domain ontologies. In [5], the authors pro- chy. In a similar vein, the method in [14] simulates
pose a classification of methods for feeding domain the topology of the DO graph in learning the repre-
knowledge to artificial neural networks (ANNs), in par- sentations of its classes and properties. A class is thus
ticular, to deep ones. Their own proposal, called know- reduced to the union of its data properties, those of its
ledge-infused learning (K-IL), addresses a variety of is- sub-classes and of related classes. On a following step,
sues with ANN, in particular, reliance on large datasets the method learns instance representations, from the
of sufficient quality, biases in training data selection, representations in the DO, and uses them in behaviour
complexity, etc. The proposed answer represents a prediction.
spectrum of fine-grained transformations of the ANN         Besides, different ways of making ontologies and
architecture reflecting the content of a KG that range ANN collaborate have been explored, e.g. ontology
from correcting the loss function to modifying the prop- learning [15] or neural reasoning with ontologies [16].
agation through the network via connection weights. For example, [17] approaches the latter task as a trans-
   The broader trend of using KG in the form of em- lation problem with noisy-data.
beddings –of vertices, edges or both– into a vector         On a broader scope, while feature vector-oriented
space, e.g. in order to support various natural lan- ANNs have shined on sequence- and grid-shaped data,
guage processing tasks, has been highly prolific for al- i.e. with values arguably more important than –the
most a decade now (see [8] for a somewhat outdated highly regular– topology, graph data, due to its in-
survey). While initial work by Bordes et al. [9, 10] herent sparsity, requires more fine-grained generaliza-
looked at embedding a triple from a KG using energy- tion (i.e. chemical functional groups, biological path-
based methods to force plausible combinations of com- ways, telecom network configurations, etc.). Graph
ponent embeddings, in [11] the focus is exclusively on Convolutional Neural Networks (GCNN) constitute a
vertices, i.e. domain entities. The proposed RDF2vec recent and promising approach for learning such reg-
method generates a set of entity sequences, through ularities [18, 19]. By applying convolution layers on
random walks and iterative neighbourhood encoding top of each other, they recursively aggregate 𝑛𝑡ℎ -order
techniques, which are then fed to word embedding meth- neighbourhood information from the graph and can
ods. In a medical context, the authors of [12], present a achieve good generalization on such datasets. Yet due
somewhat different approach toward leveraging a KG to their inherent bias toward frequent regularities, the
in neural learning: In order to assess patient risk from very local, rare and context-specific ones will arguably
a series of health events, they translate the neighbour- be missed. And, clearly, this behaviour compounds
hoods of an event-centred KG into attention filters for whenever quality data prove scarce.
a LSTM-based ANN.                                           Beyond pure generalization capabilities, dealing with
   Overall, KG embedding is not straightforwardly port- actionable and surprising patterns mixing different ab-
straction levels is to be expected: conceptually, a se-
quentially layered generalization procedure might not
prove enough to extract such regularities.
   Taking a step back, we consider three ongoing trends
each one following a founding principle: (i) K-IL sup-
ports the use of external domain knowledge as a way
to bring improvements on both predictive power and
explainability; (ii) G(C)NN approaches consider pre-         Figure 1: An example of ontological graph pattern.
serving topology as critical when working on graph
data; (iii) Contextual mechanisms lead to better results
on both static (e.g. text translation) and dynamic (e.g.     clude abstractions on both vertices (ontology classes)
user behaviour) predictive tasks [20].                       and edges (ontology properties). As an illustrative ex-
   To the best of our knowledge, no prior research has       ample, Figure 1 presents a possible pattern, illustrating
jointly addressed the above three concerns. Here, we         possible causes for a shorter than average first lacta-
present a novel approach for learning in complex do-         tion of a young cow. Here, frequently, both the young
mains that does this. By delegating most of the knowl-       cow and its ancestor have been treated with different
edge/pattern extraction effort to a dedicated symbolic       kind of antibiotics.
method, we subsequently feed those patterns as in-              The resulting graph structure can be qualified as
put features to an architecture-agnostic neural learner.     doubly-labelled, i.e. on both vertices and edges, multi-
Thus, offering ontologically-generalized graph-shaped        graphs. Practically, we first discover the interesting
features a priori overlapping with a GNN’s convolved         patterns and then, in a feature engineering step, we
high-level patterns. Nevertheless, ontological based         assign them as higher-level descriptors of the match-
generalization plays nice with robustness properties:        ing data graphs.
by going beyond mere boolean encoding of attributes             Another palpable advantage of using the ontology-
(i.e. vertices, edges) with the help of a DO’s conceptual    based patterns is that they offer an integrated view
structure, it helps the symbolic learning to not fall into   of the shared structure: Edges standing for properties
overfitting pitfalls.                                        connect class vertices, thus providing context to each
                                                             of them. On the other hand, pattern components, as
                                                             well as whole patterns, pertain to potentially varying
4. Vision & Approach details                                 abstraction levels.
                                                                More concretely, Figure 2 details our hybrid strategy
First, by bringing some ontological concepts into the        where graph patterns are first mined (step 3) and then
data as higher-order regularities we aim to make ex-         fed (step 5) into a neural network (step 6) with graph
plicit the shared conceptual structure that remains in-      data supported by a DO as our main input (steps 2 and
visible in the raw data. The rationale therefore is while    1 respectively), complementary to regular tabular data
exact values may mismatch, more abstract types de-           (step 4). The mining of ontological patterns from that
scribing those values would coincide. For instance,          graph data uses the domain ontology as backbone for
two groups of lactating cows may be treated for masti-       the exploration (e.g. ontological types, resources as
tis –a common bacterial infection of the udder– by us-       vertices and properties as edges). Between steps 3 and
ing amoxicillin and penicillin, respectively. Now know-      5, an optional post-processing step can also further re-
ing that these are both 𝛽-lactams helps extend a com-        fine the patterns to emphasize contrasts, or synthesize
mon sub-graph comprising, at least, nodes for cow and        by approximating, if required by the learning task. The
mastitis, with a further node for that class of antibi-      resulting ontological graph patterns allow the original
otics. Obviously, this increase in the shared portions of    data to be encoded with the new features supported
the data graphs w.r.t. to their raw versions would not       by domain knowledge before feeding the augmented
be possible without an ontology covering the antibi-         data to the ANN (steps 5 and 6).
otics. In a more general vein, inserting typing infor-          Pattern mining [21], aims at extracting recurrent data
mation and property generalizations helps reveal hid-        fragments, a.k.a. patterns, capturing the most relevant
den commonalities that would not be easily spotted           information possible. A mining task is defined by a
neither by a human expert, neither by a sub-symbolic         pair of languages, one for data records and one for pat-
learner.                                                     terns, and a relevance (interestingness) criterion. The
   Next, our goal is to find all significant fragments of    typical criterion is frequency of appearance, but other
such shared structure in a set of data graphs. These in-     criteria such as utility or some domain-related ones
Figure 2: High-level view of our hybrid learning process.


are possible. Moreover, an effective mining method References
requires a general strategy for pattern space traversal
and a technique to perform a pattern-to-data record [1] T. Gruber, et al., A translation approach to
matching. The later revolves around computing a vari-          portable ontology specifications, Knowledge ac-
ation of sub-graph isomorphism, here integrating the           quisition 5 (1993) 199–220.
conceptual structure of an ontology. Typically, the for-   [2] F. Kramer, T. Beißbarth, Working with ontolo-
mer entails defining a spanning tree of the pattern space      gies, in: Bioinformatics, Springer, 2017, pp. 123–
and a canonical representation of graph patterns to            135.
avoid generating multiple copies of the same pattern [22]. [3] A. Barbosa, et al., Modeling yield response
   Ontologies have been used in frequent pattern min-          to crop management using convolutional neural
ing to guide the exploration of the complex pattern            networks, Computers and Electronics in Agricul-
spaces such as sequences of objects or simple graphs           ture 170 (2020) 105197.
for some time [23, 24]. For predictive tasks exploiting    [4] C. Frasco, et al., Towards an Effective Decision-
graph pattern mining a few successful techniques ex-           making   System based on Cow Profitability using
ist such as quantitative structure-activity relationships      Deep   Learning:,    in: Proc. of the 12th ICAART,
(QSARs) [25], optimizing objective functions [26] or           Valletta, Malta,  2020,  pp. 949–958.
dedicated pattern ranking metrics [27] exploiting ex-      [5] A.  Sheth,  et al.,   Shades  of knowledge-infused
ternal domain knowledge. While ontologies and pat-             learning  for  enhancing    deep learning, IEEE In-
terns have been combined before, to the best of our            ternet  Computing     23 (2019) 54–63.
knowledge, no mining method has targeted data of [6] Y. Bengio, et al., Representation learning: A re-
such complexity.                                               view and new perspectives, IEEE Transactions
   The downside of the approach is its sensibility on          on PAMI 35 (2013) 1798–1828.
the pattern frequency threshold and the related poten-     [7] D. Dou, et al., Semantic data mining: A survey of
tial combinatorial explosion in the result. While this         ontology-based     approaches, in: IEEE ICSC 2015,
is a serious cost issues with graph patterns, possible         IEEE,  2015, pp. 244–251.
mitigation strategies exist, e.g. using condensed rep- [8] Q. Wang, et al., Knowledge graph embedding:
resentations thereof such as closed patterns [28].             A survey of approaches and applications, IEEE
   Overall, expected immediate benefits of the on-             TKDE 29 (2017) 2724–2743.
tological knowledge injection into the neural learning     [9] A. Bordes, et al., Translating embeddings for
process include higher accuracy in predictive architec-        modeling   multi-relational data, in: Advances in
ture and faster convergence.                                   neural  information    processing systems, 2013, pp.
                                                               2787–2795.
                                                          [10] A. Bordes, et al., A semantic matching energy
                                                               function for learning with multi-relational data:
     Application to word-sense disambiguation, Ma-         ships, in: International Conference on Discovery
     chine Learning 94 (2014) 233–259.                     Science, Springer, 2013, pp. 217–232.
[11] P. Ristoski, H. Paulheim, Rdf2vec: Rdf graph em- [28] X. Yan, J. Han, CloseGraph: mining closed fre-
     beddings for data mining, in: ISWC, Springer,         quent graph patterns, in: Proc. of the 9th ACM
     2016, pp. 498–514.                                    SIGKDD, ACM, 2003, pp. 286–295.
[12] C. Yin, et al., Domain knowledge guided deep
     learning with electronic health records, in: 2019
     IEEE ICDM, IEEE, 2019, p. 738–747.
[13] A. Jiménez, et al., Sound event classification us-
     ing ontology-based neural networks, in: In Proc.
     of the Annual Conference on NeurIPS, 2018.
[14] N. Phan, et al., Ontology-based deep learning for
     human behavior prediction with explanations in
     health social networks, Information sciences 384
     (2017) 298–313.
[15] M. Casteleiro, et al., Ontology learning with deep
     learning: a case study on patient safety using
     pubmed., in: SWAT4LS, 2016.
[16] P. Hohenecker, T. Lukasiewicz, Deep learning for
     ontology reasoning, CoRR (2017).
[17] B. Makni, J. Hendler, Deep learning for noise-
     tolerant rdfs reasoning, Semantic Web 10 (2019)
     823–862.
[18] K. Xu, et al., How powerful are graph neural net-
     works?, arXiv:1810.00826 [cs, stat] (2019). URL:
     http://arxiv.org/abs/1810.00826.
[19] H. Yuan, et al., Xgnn: Towards model-level ex-
     planations of graph neural networks, 26th ACM
     SIGKDD (2020) 430–438.
[20] A. Vaswani, et al., Attention is all you need, in:
     Advances in neural information processing sys-
     tems, 2017, pp. 5998–6008.
[21] C. Aggarwal, J. Han, Frequent Pattern Mining,
     2014 ed., Springer, 2014.
[22] X. Yan, J. Han, gspan: Graph-based substructure
     pattern mining, in: Proc. of the IEEE ICDM 2002,
     IEEE, 2002, pp. 721–724.
[23] M. Adda, et al., A framework for mining mean-
     ingful usage patterns within a semantically en-
     hanced web portal, in: Proc. of the 3rd C* CCSE,
     2010, pp. 138–147.
[24] A. Cakmak, G. Ozsoyoglu,               Taxonomy-
     superimposed graph mining,           in: Proc. of
     the 11th EDBT, ACM, 2008, pp. 217–228.
[25] S. Nijssen, J. Kok, Frequent graph mining and
     its application to molecular databases, in: IEEE
     Transactions on Systems, Man and Cybernetics,
     volume 5, IEEE, 2004, pp. 4571–4577.
[26] H. Saigo, et al., gboost: a mathematical program-
     ming approach to graph classification and regres-
     sion, Machine Learning 75 (2009) 69–89.
[27] E. Spyropoulou, et al., Mining interesting pat-
     terns in multi-relational data with n-ary relation-

</pre>