<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Leveraging a domain ontology in (neural) learning from heterogeneous data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tomas Martin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Petko Valtchev</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Abdoulaye Baniré Diallo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Centre de Recherche en Intelligence Artificielle (CRIA), UQAM</institution>
          ,
          <addr-line>Montreal</addr-line>
          ,
          <country country="CA">Canada</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Injecting domain knowledge into a neural learner to alleviate reliance on high-quality data and improve explainability is a rapidly expanding research trend. While most of the efort focused on regular topology formats such as sequences and grids, we consider graph datasets. Moreover, instead of knowledge graph (KG) embedding that underlies the majority of graphcentered methods, we propose a dedicated pattern mining-based approach. As our patterns are ontologically-generalized, they achieve multiple objectives: domain knowledge infusion, generalization capacity enhancement, interpretability, etc.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Domain Ontology</kwd>
        <kwd>Symbolic methods</kwd>
        <kwd>Sub-symbolic methods</kwd>
        <kwd>Neural networks</kwd>
        <kwd>Graph pattern mining</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Motivation</title>
      <p>Nowadays, implementing decision support systems to Daily activities in agro-industrial sector, e.g. a
maintehelp practitioners in complex activities has become a nance of a dairy farm, like those in other areas related
current practice in many fields. Many of these sys- to life sciences, generate large amounts of data. The
tems, traditionally, have used machine learning to pre- underlying data sources reflect complementary aspects
dict the outcome of a specific problem in the user’s en- such as farm yield, environment, animal health,
gevironment and use the prediction to suggest concrete netics, etc. The recent trend of precision(-based)
agriactions. Deep learning has arrived with a promise to culture looks at exploiting this data to support the
deexpand the areas where automation is successfully ap- cision making of domain stake-holders [3]: farmers,
plied in problem solving, hence the expectation for agronomists, dairy companies, insurers, etc.
high-quality decision support to profuse. Yet, in order to be efective, any recommendation</p>
      <p>However, predicting or learning representations on will have to reflect existing practices and, more
genersuch complex domains typically requires the availabil- ally, at least partly reflect the general knowledge from
ity of large amounts of data of suficiently high qual- the domain. For instance, at the end of each lactation a
ity. Unfortunately, in practice, such datasets are not cow gets dry for a while. Yet there is no a
straightforalways readily available. Conversely, often quantities ward way to train a neural model on milk yield data:
of machine-readable expert knowledge do exist, and The ensuing abrupt drop in milk yield is hard to digest
could potentially complement already available data. for, at least, the most popular deep learning
architecSince they reflect at least partly the expertise that un- tures [4]. Indeed, these models do not seem to properly
derlies decision making in the field, it is only natu- grasp the dynamics in a cow life-cycle, e.g. lactation,
ral to look for ways to inject that knowledge into the calvings, drying, etc.
learning process to try to guide it and compensate the While there are still work-arounds left to explore,
scarceness of high-quality data. one legitimate research question is whether injecting</p>
      <p>For several decades, ontologies, i.e. structured rep- some domain knowledge would help here. In a broader
resentation of domain concepts and their relations [1], approach, we investigate the impact of feeding
comhave been promoted as the appropriate tool for mak- plementary data, e.g. on genetics and animal health,
ing domain knowledge available for machine process- and organizing the overall dataset under a domain
oning [2]. tology (DO) providing additional descriptive
knowledge.</p>
      <p>While supplementing a neural learner with domain
knowledge stemming from an ontology is definitely
appealing, it is also a challenging task, mainly due to
the “impedance mismatch”, i.e. the divergence in the
respective levels of knowledge expression and
manipulation [5].</p>
      <p>Proceedings of the CIKM 2020 Workshops, October 19-20, Galway,
Ireland.
email: martin.tomas@courrier.uqam.ca (T. Martin);
valtchev.petko@uqam.ca (P. Valtchev); diallo.abdoulaye@uqam.ca
(A.B. Diallo)</p>
      <p>© 2020 Copyright for this paper by its authors. Use permitted under Creative
CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g CCoEmUmoRns WLiceonrsekAsthtriobuptioPnr4o.0cIneteerdnaitniognasl ((CCC EBYU4R.0)-.WS.org)</p>
    </sec>
    <sec id="sec-3">
      <title>3. Current State Of The Art</title>
      <p>able to DOs as in the case of KGs amalgamation is
favoured by them being on the same abstraction level
While symbolic representations, as a way to capture as the training data. In contrast, classes and
properknowledge, have clearly dominated the AI field since ties from a DO represent abstractions, i.e. sets of data
its inception, recently sub-symbolic ones –in the form objects and object-to-object links, hence the apparent
of trained neural networks– have rapidly gained in mismatch with the instance-centered modus operandi
popularity and use [6]. By trading discrete and man- of an ANN. Yet given the strive for (proper)
generalmade (i.e. modelling) entities of the former for more ization in ANNs, the ontological structure, with its
camachine-made (artificial) and loosely defined “patterns”, pacity to generalize along expert-validated conceptual
the later breaks free of prior knowledge in order to, hierarchies (and property ones, for that matters), is a
arguably, benefit for a more powerful yet dificult to natural ally.
interpret representation language. At its core, infor- Nonetheless, a few studies have tackled the
exploitamation is distilled throughout a network as a set of tion of generic knowledge from a DO in neural
learnwaves (or pulses) representing captured knowledge. ing. For instance, in [13], the authors exploit a DO</p>
      <p>In a broader scope, injecting domain knowledge in (a topic hierarchy, in actuality) of sound events to
ena machine learning process has been extensively re- hance a neural classifier. They propose to replicate the
searched and proven helpful in many practical situa- hierarchical structure of the DO in the ANN topology
tions [7]. More recently, since deep learning has moved by: (1) allotting a layer per level in the is-a
hierarcentre stage, the focus has shifted on making neural chy and (2) enforcing fixed distance values between
networks collaborate with symbolic knowledge sources, pairs of example embeddings, which roughly translate
mostly knowledge graphs (KG) and, somewhat more the examples’ topological distance within the
hierarmodestly, domain ontologies. In [5], the authors pro- chy. In a similar vein, the method in [14] simulates
pose a classification of methods for feeding domain the topology of the DO graph in learning the
repreknowledge to artificial neural networks (ANNs), in par- sentations of its classes and properties. A class is thus
ticular, to deep ones. Their own proposal, called know- reduced to the union of its data properties, those of its
ledge-infused learning (K-IL), addresses a variety of is- sub-classes and of related classes. On a following step,
sues with ANN, in particular, reliance on large datasets the method learns instance representations, from the
of suficient quality, biases in training data selection, representations in the DO, and uses them in behaviour
complexity, etc. The proposed answer represents a prediction.
spectrum of fine-grained transformations of the ANN Besides, diferent ways of making ontologies and
architecture reflecting the content of a KG that range ANN collaborate have been explored, e.g. ontology
from correcting the loss function to modifying the prop- learning [15] or neural reasoning with ontologies [16].
agation through the network via connection weights. For example, [17] approaches the latter task as a
trans</p>
      <p>The broader trend of using KG in the form of em- lation problem with noisy-data.
beddings –of vertices, edges or both– into a vector On a broader scope, while feature vector-oriented
space, e.g. in order to support various natural lan- ANNs have shined on sequence- and grid-shaped data,
guage processing tasks, has been highly prolific for al- i.e. with values arguably more important than –the
most a decade now (see [8] for a somewhat outdated highly regular– topology, graph data, due to its
insurvey). While initial work by Bordes et al. [9, 10] herent sparsity, requires more fine-grained
generalizalooked at embedding a triple from a KG using energy- tion (i.e. chemical functional groups, biological
pathbased methods to force plausible combinations of com- ways, telecom network configurations, etc.). Graph
ponent embeddings, in [11] the focus is exclusively on Convolutional Neural Networks (GCNN) constitute a
vertices, i.e. domain entities. The proposed RDF2vec recent and promising approach for learning such
regmethod generates a set of entity sequences, through ularities [18, 19]. By applying convolution layers on
random walks and iterative neighbourhood encoding top of each other, they recursively aggregate  ℎ -order
techniques, which are then fed to word embedding meth- neighbourhood information from the graph and can
ods. In a medical context, the authors of [12], present a achieve good generalization on such datasets. Yet due
somewhat diferent approach toward leveraging a KG to their inherent bias toward frequent regularities, the
in neural learning: In order to assess patient risk from very local, rare and context-specific ones will arguably
a series of health events, they translate the neighbour- be missed. And, clearly, this behaviour compounds
hoods of an event-centred KG into attention filters for whenever quality data prove scarce.
a LSTM-based ANN. Beyond pure generalization capabilities, dealing with</p>
      <p>Overall, KG embedding is not straightforwardly port- actionable and surprising patterns mixing diferent
abstraction levels is to be expected: conceptually, a
sequentially layered generalization procedure might not
prove enough to extract such regularities.</p>
      <p>Taking a step back, we consider three ongoing trends
each one following a founding principle: (i) K-IL
supports the use of external domain knowledge as a way
to bring improvements on both predictive power and
explainability; (ii) G(C)NN approaches consider pre- Figure 1: An example of ontological graph pattern.
serving topology as critical when working on graph
data; (iii) Contextual mechanisms lead to better results
on both static (e.g. text translation) and dynamic (e.g. clude abstractions on both vertices (ontology classes)
user behaviour) predictive tasks [20]. and edges (ontology properties). As an illustrative
ex</p>
      <p>To the best of our knowledge, no prior research has ample, Figure 1 presents a possible pattern, illustrating
jointly addressed the above three concerns. Here, we possible causes for a shorter than average first
lactapresent a novel approach for learning in complex do- tion of a young cow. Here, frequently, both the young
mains that does this. By delegating most of the knowl- cow and its ancestor have been treated with diferent
edge/pattern extraction efort to a dedicated symbolic kind of antibiotics.
method, we subsequently feed those patterns as in- The resulting graph structure can be qualified as
put features to an architecture-agnostic neural learner. doubly-labelled, i.e. on both vertices and edges,
multiThus, ofering ontologically-generalized graph-shaped graphs. Practically, we first discover the interesting
features a priori overlapping with a GNN’s convolved patterns and then, in a feature engineering step, we
high-level patterns. Nevertheless, ontological based assign them as higher-level descriptors of the
matchgeneralization plays nice with robustness properties: ing data graphs.
by going beyond mere boolean encoding of attributes Another palpable advantage of using the
ontology(i.e. vertices, edges) with the help of a DO’s conceptual based patterns is that they ofer an integrated view
structure, it helps the symbolic learning to not fall into of the shared structure: Edges standing for properties
overfitting pitfalls. connect class vertices, thus providing context to each
of them. On the other hand, pattern components, as
well as whole patterns, pertain to potentially varying
4. Vision &amp; Approach details abstraction levels.</p>
      <p>More concretely, Figure 2 details our hybrid strategy
First, by bringing some ontological concepts into the where graph patterns are first mined (step 3) and then
data as higher-order regularities we aim to make ex- fed (step 5) into a neural network (step 6) with graph
plicit the shared conceptual structure that remains in- data supported by a DO as our main input (steps 2 and
visible in the raw data. The rationale therefore is while 1 respectively), complementary to regular tabular data
exact values may mismatch, more abstract types de- (step 4). The mining of ontological patterns from that
scribing those values would coincide. For instance, graph data uses the domain ontology as backbone for
two groups of lactating cows may be treated for masti- the exploration (e.g. ontological types, resources as
tis –a common bacterial infection of the udder– by us- vertices and properties as edges). Between steps 3 and
ing amoxicillin and penicillin, respectively. Now know- 5, an optional post-processing step can also further
reing that these are both  -lactams helps extend a com- ifne the patterns to emphasize contrasts, or synthesize
mon sub-graph comprising, at least, nodes for cow and by approximating, if required by the learning task. The
mastitis, with a further node for that class of antibi- resulting ontological graph patterns allow the original
otics. Obviously, this increase in the shared portions of data to be encoded with the new features supported
the data graphs w.r.t. to their raw versions would not by domain knowledge before feeding the augmented
be possible without an ontology covering the antibi- data to the ANN (steps 5 and 6).
otics. In a more general vein, inserting typing infor- Pattern mining [21], aims at extracting recurrent data
mation and property generalizations helps reveal hid- fragments, a.k.a. patterns, capturing the most relevant
den commonalities that would not be easily spotted information possible. A mining task is defined by a
neither by a human expert, neither by a sub-symbolic pair of languages, one for data records and one for
patlearner. terns, and a relevance (interestingness) criterion. The</p>
      <p>Next, our goal is to find all significant fragments of typical criterion is frequency of appearance, but other
such shared structure in a set of data graphs. These in- criteria such as utility or some domain-related ones
are possible. Moreover, an efective mining method
requires a general strategy for pattern space traversal
and a technique to perform a pattern-to-data record [1] T. Gruber, et al., A translation approach to
matching. The later revolves around computing a vari- portable ontology specifications, Knowledge
acation of sub-graph isomorphism, here integrating the quisition 5 (1993) 199–220.
conceptual structure of an ontology. Typically, the for- [2] F. Kramer, T. Beißbarth, Working with
ontolomer entails defining a spanning tree of the pattern space gies, in: Bioinformatics, Springer, 2017, pp. 123–
and a canonical representation of graph patterns to 135.
avoid generating multiple copies of the same pattern [22]. [3] A. Barbosa, et al., Modeling yield response</p>
      <p>Ontologies have been used in frequent pattern min- to crop management using convolutional neural
ing to guide the exploration of the complex pattern networks, Computers and Electronics in
Agriculspaces such as sequences of objects or simple graphs ture 170 (2020) 105197.
for some time [23, 24]. For predictive tasks exploiting [4] C. Frasco, et al., Towards an Efective
Decisiongraph pattern mining a few successful techniques ex- making System based on Cow Profitability using
ist such as quantitative structure-activity relationships Deep Learning:, in: Proc. of the 12th ICAART,
(QSARs) [25], optimizing objective functions [26] or Valletta, Malta, 2020, pp. 949–958.
dedicated pattern ranking metrics [27] exploiting ex- [5] A. Sheth, et al., Shades of knowledge-infused
ternal domain knowledge. While ontologies and pat- learning for enhancing deep learning, IEEE
Interns have been combined before, to the best of our ternet Computing 23 (2019) 54–63.
knowledge, no mining method has targeted data of [6] Y. Bengio, et al., Representation learning: A
resuch complexity. view and new perspectives, IEEE Transactions</p>
      <p>The downside of the approach is its sensibility on on PAMI 35 (2013) 1798–1828.
the pattern frequency threshold and the related poten- [7] D. Dou, et al., Semantic data mining: A survey of
tial combinatorial explosion in the result. While this ontology-based approaches, in: IEEE ICSC 2015,
is a serious cost issues with graph patterns, possible IEEE, 2015, pp. 244–251.
mitigation strategies exist, e.g. using condensed rep- [8] Q. Wang, et al., Knowledge graph embedding:
resentations thereof such as closed patterns [28]. A survey of approaches and applications, IEEE</p>
      <p>Overall, expected immediate benefits of the on- TKDE 29 (2017) 2724–2743.
tological knowledge injection into the neural learning [9] A. Bordes, et al., Translating embeddings for
process include higher accuracy in predictive architec- modeling multi-relational data, in: Advances in
ture and faster convergence. neural information processing systems, 2013, pp.
2787–2795.
[10] A. Bordes, et al., A semantic matching energy
function for learning with multi-relational data:
Application to word-sense disambiguation, Ma- ships, in: International Conference on Discovery
chine Learning 94 (2014) 233–259. Science, Springer, 2013, pp. 217–232.
[11] P. Ristoski, H. Paulheim, Rdf2vec: Rdf graph em- [28] X. Yan, J. Han, CloseGraph: mining closed
frebeddings for data mining, in: ISWC, Springer, quent graph patterns, in: Proc. of the 9th ACM
2016, pp. 498–514. SIGKDD, ACM, 2003, pp. 286–295.
[12] C. Yin, et al., Domain knowledge guided deep
learning with electronic health records, in: 2019</p>
      <p>IEEE ICDM, IEEE, 2019, p. 738–747.
[13] A. Jiménez, et al., Sound event classification
using ontology-based neural networks, in: In Proc.</p>
      <p>of the Annual Conference on NeurIPS, 2018.
[14] N. Phan, et al., Ontology-based deep learning for
human behavior prediction with explanations in
health social networks, Information sciences 384
(2017) 298–313.
[15] M. Casteleiro, et al., Ontology learning with deep
learning: a case study on patient safety using
pubmed., in: SWAT4LS, 2016.
[16] P. Hohenecker, T. Lukasiewicz, Deep learning for</p>
      <p>ontology reasoning, CoRR (2017).
[17] B. Makni, J. Hendler, Deep learning for
noisetolerant rdfs reasoning, Semantic Web 10 (2019)
823–862.
[18] K. Xu, et al., How powerful are graph neural
networks?, arXiv:1810.00826 [cs, stat] (2019). URL:
http://arxiv.org/abs/1810.00826.
[19] H. Yuan, et al., Xgnn: Towards model-level
explanations of graph neural networks, 26th ACM</p>
      <p>SIGKDD (2020) 430–438.
[20] A. Vaswani, et al., Attention is all you need, in:</p>
      <p>Advances in neural information processing
systems, 2017, pp. 5998–6008.
[21] C. Aggarwal, J. Han, Frequent Pattern Mining,</p>
      <p>2014 ed., Springer, 2014.
[22] X. Yan, J. Han, gspan: Graph-based substructure
pattern mining, in: Proc. of the IEEE ICDM 2002,</p>
      <p>IEEE, 2002, pp. 721–724.
[23] M. Adda, et al., A framework for mining
meaningful usage patterns within a semantically
enhanced web portal, in: Proc. of the 3rd C* CCSE,
2010, pp. 138–147.
[24] A. Cakmak, G. Ozsoyoglu,
Taxonomysuperimposed graph mining, in: Proc. of
the 11th EDBT, ACM, 2008, pp. 217–228.
[25] S. Nijssen, J. Kok, Frequent graph mining and
its application to molecular databases, in: IEEE
Transactions on Systems, Man and Cybernetics,
volume 5, IEEE, 2004, pp. 4571–4577.
[26] H. Saigo, et al., gboost: a mathematical
programming approach to graph classification and
regression, Machine Learning 75 (2009) 69–89.
[27] E. Spyropoulou, et al., Mining interesting
patterns in multi-relational data with n-ary
relation</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>