=Paper=
{{Paper
|id=Vol-2699/paper11
|storemode=property
|title=Leveraging a Domain Ontology in (Neural) Learning from Heterogeneous Data
|pdfUrl=https://ceur-ws.org/Vol-2699/paper11.pdf
|volume=Vol-2699
|authors=Tomas Martin,Petko Valtchev,Abdoulaye Baniré Diallo
|dblpUrl=https://dblp.org/rec/conf/cikm/MartinVD20
}}
==Leveraging a Domain Ontology in (Neural) Learning from Heterogeneous Data==
Leveraging a domain ontology in (neural) learning from heterogeneous data Tomas Martina , Petko Valtcheva and Abdoulaye Baniré Dialloa a Centre de Recherche en Intelligence Artificielle (CRIA), UQAM, Montreal, Canada Abstract Injecting domain knowledge into a neural learner to alleviate reliance on high-quality data and improve explainability is a rapidly expanding research trend. While most of the effort focused on regular topology formats such as sequences and grids, we consider graph datasets. Moreover, instead of knowledge graph (KG) embedding that underlies the majority of graph- centered methods, we propose a dedicated pattern mining-based approach. As our patterns are ontologically-generalized, they achieve multiple objectives: domain knowledge infusion, generalization capacity enhancement, interpretability, etc. Keywords Domain Ontology, Symbolic methods, Sub-symbolic methods, Neural networks, Graph pattern mining 1. Introduction 2. Motivation Nowadays, implementing decision support systems to Daily activities in agro-industrial sector, e.g. a mainte- help practitioners in complex activities has become a nance of a dairy farm, like those in other areas related current practice in many fields. Many of these sys- to life sciences, generate large amounts of data. The tems, traditionally, have used machine learning to pre- underlying data sources reflect complementary aspects dict the outcome of a specific problem in the user’s en- such as farm yield, environment, animal health, ge- vironment and use the prediction to suggest concrete netics, etc. The recent trend of precision(-based) agri- actions. Deep learning has arrived with a promise to culture looks at exploiting this data to support the de- expand the areas where automation is successfully ap- cision making of domain stake-holders [3]: farmers, plied in problem solving, hence the expectation for agronomists, dairy companies, insurers, etc. high-quality decision support to profuse. Yet, in order to be effective, any recommendation However, predicting or learning representations on will have to reflect existing practices and, more gener- such complex domains typically requires the availabil- ally, at least partly reflect the general knowledge from ity of large amounts of data of sufficiently high qual- the domain. For instance, at the end of each lactation a ity. Unfortunately, in practice, such datasets are not cow gets dry for a while. Yet there is no a straightfor- always readily available. Conversely, often quantities ward way to train a neural model on milk yield data: of machine-readable expert knowledge do exist, and The ensuing abrupt drop in milk yield is hard to digest could potentially complement already available data. for, at least, the most popular deep learning architec- Since they reflect at least partly the expertise that un- tures [4]. Indeed, these models do not seem to properly derlies decision making in the field, it is only natu- grasp the dynamics in a cow life-cycle, e.g. lactation, ral to look for ways to inject that knowledge into the calvings, drying, etc. learning process to try to guide it and compensate the While there are still work-arounds left to explore, scarceness of high-quality data. one legitimate research question is whether injecting For several decades, ontologies, i.e. structured rep- some domain knowledge would help here. In a broader resentation of domain concepts and their relations [1], approach, we investigate the impact of feeding com- have been promoted as the appropriate tool for mak- plementary data, e.g. on genetics and animal health, ing domain knowledge available for machine process- and organizing the overall dataset under a domain on- ing [2]. tology (DO) providing additional descriptive knowl- edge. While supplementing a neural learner with domain Proceedings of the CIKM 2020 Workshops, October 19-20, Galway, Ireland. knowledge stemming from an ontology is definitely email: martin.tomas@courrier.uqam.ca (T. Martin); appealing, it is also a challenging task, mainly due to valtchev.petko@uqam.ca (P. Valtchev); diallo.abdoulaye@uqam.ca the “impedance mismatch”, i.e. the divergence in the (A.B. Diallo) © 2020 Copyright for this paper by its authors. Use permitted under Creative respective levels of knowledge expression and manip- CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) ulation [5]. 3. Current State Of The Art able to DOs as in the case of KGs amalgamation is favoured by them being on the same abstraction level While symbolic representations, as a way to capture as the training data. In contrast, classes and proper- knowledge, have clearly dominated the AI field since ties from a DO represent abstractions, i.e. sets of data its inception, recently sub-symbolic ones –in the form objects and object-to-object links, hence the apparent of trained neural networks– have rapidly gained in mismatch with the instance-centered modus operandi popularity and use [6]. By trading discrete and man- of an ANN. Yet given the strive for (proper) general- made (i.e. modelling) entities of the former for more ization in ANNs, the ontological structure, with its ca- machine-made (artificial) and loosely defined “patterns”, pacity to generalize along expert-validated conceptual the later breaks free of prior knowledge in order to, hierarchies (and property ones, for that matters), is a arguably, benefit for a more powerful yet difficult to natural ally. interpret representation language. At its core, infor- Nonetheless, a few studies have tackled the exploita- mation is distilled throughout a network as a set of tion of generic knowledge from a DO in neural learn- waves (or pulses) representing captured knowledge. ing. For instance, in [13], the authors exploit a DO In a broader scope, injecting domain knowledge in (a topic hierarchy, in actuality) of sound events to en- a machine learning process has been extensively re- hance a neural classifier. They propose to replicate the searched and proven helpful in many practical situa- hierarchical structure of the DO in the ANN topology tions [7]. More recently, since deep learning has moved by: (1) allotting a layer per level in the is-a hierar- centre stage, the focus has shifted on making neural chy and (2) enforcing fixed distance values between networks collaborate with symbolic knowledge sources, pairs of example embeddings, which roughly translate mostly knowledge graphs (KG) and, somewhat more the examples’ topological distance within the hierar- modestly, domain ontologies. In [5], the authors pro- chy. In a similar vein, the method in [14] simulates pose a classification of methods for feeding domain the topology of the DO graph in learning the repre- knowledge to artificial neural networks (ANNs), in par- sentations of its classes and properties. A class is thus ticular, to deep ones. Their own proposal, called know- reduced to the union of its data properties, those of its ledge-infused learning (K-IL), addresses a variety of is- sub-classes and of related classes. On a following step, sues with ANN, in particular, reliance on large datasets the method learns instance representations, from the of sufficient quality, biases in training data selection, representations in the DO, and uses them in behaviour complexity, etc. The proposed answer represents a prediction. spectrum of fine-grained transformations of the ANN Besides, different ways of making ontologies and architecture reflecting the content of a KG that range ANN collaborate have been explored, e.g. ontology from correcting the loss function to modifying the prop- learning [15] or neural reasoning with ontologies [16]. agation through the network via connection weights. For example, [17] approaches the latter task as a trans- The broader trend of using KG in the form of em- lation problem with noisy-data. beddings –of vertices, edges or both– into a vector On a broader scope, while feature vector-oriented space, e.g. in order to support various natural lan- ANNs have shined on sequence- and grid-shaped data, guage processing tasks, has been highly prolific for al- i.e. with values arguably more important than –the most a decade now (see [8] for a somewhat outdated highly regular– topology, graph data, due to its in- survey). While initial work by Bordes et al. [9, 10] herent sparsity, requires more fine-grained generaliza- looked at embedding a triple from a KG using energy- tion (i.e. chemical functional groups, biological path- based methods to force plausible combinations of com- ways, telecom network configurations, etc.). Graph ponent embeddings, in [11] the focus is exclusively on Convolutional Neural Networks (GCNN) constitute a vertices, i.e. domain entities. The proposed RDF2vec recent and promising approach for learning such reg- method generates a set of entity sequences, through ularities [18, 19]. By applying convolution layers on random walks and iterative neighbourhood encoding top of each other, they recursively aggregate 𝑛𝑡ℎ -order techniques, which are then fed to word embedding meth- neighbourhood information from the graph and can ods. In a medical context, the authors of [12], present a achieve good generalization on such datasets. Yet due somewhat different approach toward leveraging a KG to their inherent bias toward frequent regularities, the in neural learning: In order to assess patient risk from very local, rare and context-specific ones will arguably a series of health events, they translate the neighbour- be missed. And, clearly, this behaviour compounds hoods of an event-centred KG into attention filters for whenever quality data prove scarce. a LSTM-based ANN. Beyond pure generalization capabilities, dealing with Overall, KG embedding is not straightforwardly port- actionable and surprising patterns mixing different ab- straction levels is to be expected: conceptually, a se- quentially layered generalization procedure might not prove enough to extract such regularities. Taking a step back, we consider three ongoing trends each one following a founding principle: (i) K-IL sup- ports the use of external domain knowledge as a way to bring improvements on both predictive power and explainability; (ii) G(C)NN approaches consider pre- Figure 1: An example of ontological graph pattern. serving topology as critical when working on graph data; (iii) Contextual mechanisms lead to better results on both static (e.g. text translation) and dynamic (e.g. clude abstractions on both vertices (ontology classes) user behaviour) predictive tasks [20]. and edges (ontology properties). As an illustrative ex- To the best of our knowledge, no prior research has ample, Figure 1 presents a possible pattern, illustrating jointly addressed the above three concerns. Here, we possible causes for a shorter than average first lacta- present a novel approach for learning in complex do- tion of a young cow. Here, frequently, both the young mains that does this. By delegating most of the knowl- cow and its ancestor have been treated with different edge/pattern extraction effort to a dedicated symbolic kind of antibiotics. method, we subsequently feed those patterns as in- The resulting graph structure can be qualified as put features to an architecture-agnostic neural learner. doubly-labelled, i.e. on both vertices and edges, multi- Thus, offering ontologically-generalized graph-shaped graphs. Practically, we first discover the interesting features a priori overlapping with a GNN’s convolved patterns and then, in a feature engineering step, we high-level patterns. Nevertheless, ontological based assign them as higher-level descriptors of the match- generalization plays nice with robustness properties: ing data graphs. by going beyond mere boolean encoding of attributes Another palpable advantage of using the ontology- (i.e. vertices, edges) with the help of a DO’s conceptual based patterns is that they offer an integrated view structure, it helps the symbolic learning to not fall into of the shared structure: Edges standing for properties overfitting pitfalls. connect class vertices, thus providing context to each of them. On the other hand, pattern components, as well as whole patterns, pertain to potentially varying 4. Vision & Approach details abstraction levels. More concretely, Figure 2 details our hybrid strategy First, by bringing some ontological concepts into the where graph patterns are first mined (step 3) and then data as higher-order regularities we aim to make ex- fed (step 5) into a neural network (step 6) with graph plicit the shared conceptual structure that remains in- data supported by a DO as our main input (steps 2 and visible in the raw data. The rationale therefore is while 1 respectively), complementary to regular tabular data exact values may mismatch, more abstract types de- (step 4). The mining of ontological patterns from that scribing those values would coincide. For instance, graph data uses the domain ontology as backbone for two groups of lactating cows may be treated for masti- the exploration (e.g. ontological types, resources as tis –a common bacterial infection of the udder– by us- vertices and properties as edges). Between steps 3 and ing amoxicillin and penicillin, respectively. Now know- 5, an optional post-processing step can also further re- ing that these are both 𝛽-lactams helps extend a com- fine the patterns to emphasize contrasts, or synthesize mon sub-graph comprising, at least, nodes for cow and by approximating, if required by the learning task. The mastitis, with a further node for that class of antibi- resulting ontological graph patterns allow the original otics. Obviously, this increase in the shared portions of data to be encoded with the new features supported the data graphs w.r.t. to their raw versions would not by domain knowledge before feeding the augmented be possible without an ontology covering the antibi- data to the ANN (steps 5 and 6). otics. In a more general vein, inserting typing infor- Pattern mining [21], aims at extracting recurrent data mation and property generalizations helps reveal hid- fragments, a.k.a. patterns, capturing the most relevant den commonalities that would not be easily spotted information possible. A mining task is defined by a neither by a human expert, neither by a sub-symbolic pair of languages, one for data records and one for pat- learner. terns, and a relevance (interestingness) criterion. The Next, our goal is to find all significant fragments of typical criterion is frequency of appearance, but other such shared structure in a set of data graphs. These in- criteria such as utility or some domain-related ones Figure 2: High-level view of our hybrid learning process. are possible. Moreover, an effective mining method References requires a general strategy for pattern space traversal and a technique to perform a pattern-to-data record [1] T. Gruber, et al., A translation approach to matching. The later revolves around computing a vari- portable ontology specifications, Knowledge ac- ation of sub-graph isomorphism, here integrating the quisition 5 (1993) 199–220. conceptual structure of an ontology. Typically, the for- [2] F. Kramer, T. Beißbarth, Working with ontolo- mer entails defining a spanning tree of the pattern space gies, in: Bioinformatics, Springer, 2017, pp. 123– and a canonical representation of graph patterns to 135. avoid generating multiple copies of the same pattern [22]. [3] A. Barbosa, et al., Modeling yield response Ontologies have been used in frequent pattern min- to crop management using convolutional neural ing to guide the exploration of the complex pattern networks, Computers and Electronics in Agricul- spaces such as sequences of objects or simple graphs ture 170 (2020) 105197. for some time [23, 24]. For predictive tasks exploiting [4] C. Frasco, et al., Towards an Effective Decision- graph pattern mining a few successful techniques ex- making System based on Cow Profitability using ist such as quantitative structure-activity relationships Deep Learning:, in: Proc. of the 12th ICAART, (QSARs) [25], optimizing objective functions [26] or Valletta, Malta, 2020, pp. 949–958. dedicated pattern ranking metrics [27] exploiting ex- [5] A. Sheth, et al., Shades of knowledge-infused ternal domain knowledge. While ontologies and pat- learning for enhancing deep learning, IEEE In- terns have been combined before, to the best of our ternet Computing 23 (2019) 54–63. knowledge, no mining method has targeted data of [6] Y. Bengio, et al., Representation learning: A re- such complexity. view and new perspectives, IEEE Transactions The downside of the approach is its sensibility on on PAMI 35 (2013) 1798–1828. the pattern frequency threshold and the related poten- [7] D. Dou, et al., Semantic data mining: A survey of tial combinatorial explosion in the result. While this ontology-based approaches, in: IEEE ICSC 2015, is a serious cost issues with graph patterns, possible IEEE, 2015, pp. 244–251. mitigation strategies exist, e.g. using condensed rep- [8] Q. Wang, et al., Knowledge graph embedding: resentations thereof such as closed patterns [28]. A survey of approaches and applications, IEEE Overall, expected immediate benefits of the on- TKDE 29 (2017) 2724–2743. tological knowledge injection into the neural learning [9] A. Bordes, et al., Translating embeddings for process include higher accuracy in predictive architec- modeling multi-relational data, in: Advances in ture and faster convergence. neural information processing systems, 2013, pp. 2787–2795. [10] A. Bordes, et al., A semantic matching energy function for learning with multi-relational data: Application to word-sense disambiguation, Ma- ships, in: International Conference on Discovery chine Learning 94 (2014) 233–259. Science, Springer, 2013, pp. 217–232. [11] P. Ristoski, H. Paulheim, Rdf2vec: Rdf graph em- [28] X. Yan, J. Han, CloseGraph: mining closed fre- beddings for data mining, in: ISWC, Springer, quent graph patterns, in: Proc. of the 9th ACM 2016, pp. 498–514. SIGKDD, ACM, 2003, pp. 286–295. [12] C. Yin, et al., Domain knowledge guided deep learning with electronic health records, in: 2019 IEEE ICDM, IEEE, 2019, p. 738–747. [13] A. Jiménez, et al., Sound event classification us- ing ontology-based neural networks, in: In Proc. of the Annual Conference on NeurIPS, 2018. [14] N. Phan, et al., Ontology-based deep learning for human behavior prediction with explanations in health social networks, Information sciences 384 (2017) 298–313. [15] M. Casteleiro, et al., Ontology learning with deep learning: a case study on patient safety using pubmed., in: SWAT4LS, 2016. [16] P. Hohenecker, T. Lukasiewicz, Deep learning for ontology reasoning, CoRR (2017). [17] B. Makni, J. Hendler, Deep learning for noise- tolerant rdfs reasoning, Semantic Web 10 (2019) 823–862. [18] K. Xu, et al., How powerful are graph neural net- works?, arXiv:1810.00826 [cs, stat] (2019). URL: http://arxiv.org/abs/1810.00826. [19] H. Yuan, et al., Xgnn: Towards model-level ex- planations of graph neural networks, 26th ACM SIGKDD (2020) 430–438. [20] A. Vaswani, et al., Attention is all you need, in: Advances in neural information processing sys- tems, 2017, pp. 5998–6008. [21] C. Aggarwal, J. Han, Frequent Pattern Mining, 2014 ed., Springer, 2014. [22] X. Yan, J. Han, gspan: Graph-based substructure pattern mining, in: Proc. of the IEEE ICDM 2002, IEEE, 2002, pp. 721–724. [23] M. Adda, et al., A framework for mining mean- ingful usage patterns within a semantically en- hanced web portal, in: Proc. of the 3rd C* CCSE, 2010, pp. 138–147. [24] A. Cakmak, G. Ozsoyoglu, Taxonomy- superimposed graph mining, in: Proc. of the 11th EDBT, ACM, 2008, pp. 217–228. [25] S. Nijssen, J. Kok, Frequent graph mining and its application to molecular databases, in: IEEE Transactions on Systems, Man and Cybernetics, volume 5, IEEE, 2004, pp. 4571–4577. [26] H. Saigo, et al., gboost: a mathematical program- ming approach to graph classification and regres- sion, Machine Learning 75 (2009) 69–89. [27] E. Spyropoulou, et al., Mining interesting pat- terns in multi-relational data with n-ary relation-