<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Knowledge Infused Learning (K-IL): Towards Deep Incorporation of Knowledge in Deep Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ugur Kursuncu</string-name>
          <email>kursuncu@mailbox.sc.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Manas Gaur</string-name>
          <email>mgaur@email.sc.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Amit Sheth</string-name>
          <email>amit@sc.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>AI Institute, University of South Carolina Columbia</institution>
          ,
          <addr-line>SC</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <fpage>23</fpage>
      <lpage>25</lpage>
      <abstract>
        <p>Learning the underlying patterns in data goes beyond instance-based generalization to external knowledge represented in structured graphs or networks. Deep learning that primarily constitutes neural computing stream in AI has shown significant advances in probabilistically learning latent patterns using a multi-layered network of computational nodes (i.e., neurons/hidden units). Structured knowledge that underlies symbolic computing approaches and often supports reasoning, has also seen significant growth in recent years, in the form of broad-based (e.g., DBPedia, Yago) and domain, industry or application specific knowledge graphs. A common substrate with careful integration of the two will raise opportunities to develop neuro-symbolic learning approaches for AI, where conceptual and probabilistic representations are combined. As the incorporation of external knowledge will aid in supervising the learning of features for the model, deep infusion of representational knowledge from knowledge graphs within hidden layers will further enhance the learning process. Although much work remains, we believe that knowledge graphs will play an increasing role in developing hybrid neuro-symbolic intelligent systems (bottomup deep learning with top-down symbolic computing) as well as in building explainable AI systems for which knowledge graphs will provide scaffolding for punctuating neural computing. In this position paper, we describe our motivation for such a neuro-symbolic approach and framework that combines knowledge graph and neural networks.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Data-driven bottom-up machine/deep learning (ML) and
top-down knowledge-driven approaches to creating reliable
models, have shown remarkable success in specific areas,
such as search, speech recognition, language translation,
computer vision, and autonomous vehicles. On the other
hand, they have had limited success in understanding and
deciphering contextual information, such as detection of
abstract concepts in online/offline human interactions. Current
challenges in the translation of research methods and
resources into practice often draw from a class of rarely
studied problems that do not yield to contemporary bottom-up
ML methods. Policymakers and practitioners assert serious
usability concerns that constrain adoption, notably in
highconsequence domains
        <xref ref-type="bibr" rid="ref80">(Topol 2019)</xref>
        . In most cases,
datadependent ML algorithms require high computing power
and large datasets, where the crucial signals may still be
sparse or ambiguous, threatening precision
        <xref ref-type="bibr" rid="ref14">(Cheng 2018)</xref>
        .
Moreover, the ML models that are deployed in the
absence of transparency and accountability
        <xref ref-type="bibr" rid="ref65">(Rudin 2019)</xref>
        and
trained on biased datasets, can lead to grave consequences,
such as potential social discrimination and unfair treatment
        <xref ref-type="bibr" rid="ref58">(Olteanu et al. 2019)</xref>
        . Further, the potentially severe
implications of false alarms in an ML-integrated real-world
application may affect millions of people
        <xref ref-type="bibr" rid="ref22 ref23 ref40 ref41 ref42 ref44 ref58 ref73">(Kursuncu et al. 2019a;
Kursuncu 2018)</xref>
        .
      </p>
      <p>
        The fundamental challenges are common to a majority of
problems in a variety of domains with real world impact.
Specifically, these challenges are: (1) dependency on large
datasets required for bottom-up, data-dependent ML
algorithms
        <xref ref-type="bibr" rid="ref17 ref81">(Valiant 2000; De Palma, Kiani, and Lloyd 2019)</xref>
        ,
(2) bias in the dataset, enabling the model to
emphpotentially cause social discrimination and unfair treatment, (3)
multidimensionality, ambiguity and sparsity, as the data
involves unconstrained concepts and relationships with
meaning from different contextual dimensions of the content such
as religion, history and politics
        <xref ref-type="bibr" rid="ref22 ref23 ref40 ref41 ref42 ref44 ref58 ref73">(Kursuncu et al. 2019a;
Kursuncu 2018)</xref>
        . Further, the limited number of labeled
instances available for training may fail to represent the true
nature of concepts and relationships in data sets, leading to
ambiguous or sparse true signals (4) the lack of information
traceability for model explainability, (5) the coverage of
information specific to a domain that would be missed
otherwise, (6) the complexity of model architecture in time and
space1, and (7) false alarms in model performance.
Consequently, we believe standard separate knowledge graph KG
and ML methods are vulnerable to deduce or learn spurious
concepts and relationships that appear deceptively good on
a KG or training datasets, yet do not provide adequate
re
      </p>
      <sec id="sec-1-1">
        <title>1https://www.theguardian.com/commentisfree/2019/nov/</title>
        <p>16/can-planet-afford-exorbitant-power-demands-of-machinelearning
sults when the data set contains contextual and dynamically
changing concepts and relations.</p>
        <p>
          In this position paper, we describe innovations that will
operationalize more abstract models built upon the
characteristics of a domain to render them computationally
accessible within neural network architectures. We propose
a neuro-symbolic method, knowledge-infused learning that
measures information loss in latent features learned by
neural networks through KGs with conceptual and structural
relationship information, for addressing the aforementioned
challenges. The infusion of knowledge during the
representation learning phase raises the following central research
questions: (i) How do we decide whether to infuse
knowledge or not, at a particular stage while learning between
layers, and how to quantify knowledge to be infused? (ii)
How to merge latent representations between layers with
external knowledge representations, and (iii) How to
propagate the knowledge through the learned latent
representation? Considering the future deployment of AI in
applications, the potential impact of this approach is significant.
As stated in
          <xref ref-type="bibr" rid="ref37">(Karpathy 2015)</xref>
          , the deeper the network, the
denser the representation and better the learning. A large
number of parameters and the layered nature of neural
networks make them modifiable based on specific problem
characteristics. However, the challenges (1, 3, 5 and 7) make
neural networks vulnerable to the sudden appearance of
relevant-but-sparse or ambiguous features, in often noisy
big data
          <xref ref-type="bibr" rid="ref13 ref17 ref17 ref41 ref42 ref81">(Valiant 2000; De Palma, Kiani, and Lloyd 2019;
Kursuncu et al. 2019b)</xref>
          . On the other hand, KG-based
approaches structure search within a feature space defined by
domain experts. To compensate for the vulnerability of the
aforementioned challenges, incorporating knowledge to the
learned representation in principled fashion is required. A
promising approach is to base this on a measurable
discrepancy between the knowledge captured in the neural network
and external resources.
        </p>
        <p>
          Computational modeling coupled with knowledge
infusion in a neural network will disambiguate important
concepts defined in a KG with their different semantic meanings
through its structural relations. Knowledge infusion will
redefine the emphasis of sparse but essential, and irrelevant
but frequently occurring terms and concepts, boosting
recall without reducing precision. Further, it will provide
explanatory insight into the model, robustness to noise and
reduce dependency on frequency in the learning process. This
neuro-symbolic learning approach will potentially transform
existing methods for data analysis and building
computational models. While the impact of this approach is
transferable (and replicable) to a majority of domains, the
explicit implications are particularly apparent for social
science
          <xref ref-type="bibr" rid="ref23 ref41 ref42 ref58 ref73">(Kursuncu et al. 2019a)</xref>
          and healthcare domains
          <xref ref-type="bibr" rid="ref22 ref3 ref4 ref40">(Gaur
et al. 2018)</xref>
          .
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>As the incorporation of knowledge has been explored in
various forms in prior research, in this section, we describe
the methodologies and applications specifically related to
knowledge-infused learning: Neural language models,
neural attention models, knowledge based neural networks, all
of which utilize external knowledge before/after the
representation has been generated.</p>
      <sec id="sec-2-1">
        <title>Neural Language Models (NLMs)</title>
        <p>
          NLMs are a category of neural networks capable of
learning sequential dependencies in a sentence, and preserve such
information while learning a representation. In particular,
LSTM (Long Short Term Memory) networks
          <xref ref-type="bibr" rid="ref31">(Hochreiter
and Schmidhuber 1997)</xref>
          have emerged from the failure of
RNNs (Recurrent Neural Networks) in remembering
longterm information. Concerning the loss of contextual
information while learning,
          <xref ref-type="bibr" rid="ref15">(Cho et al. 2014)</xref>
          proposed a
contextfeed forward LSTM architecture in which context is learned
by the previous layer merged with forgetting and
modulation gates of the next layer. However, if erroneous
contextual information is learned in previous layers, it is difficult
to correct
          <xref ref-type="bibr" rid="ref51">(Masse, Grant, and Freedman 2018)</xref>
          , which is a
problem magnified by noisy data and content sparsity (e.g.
Twitter, Reddit, Blogs).
        </p>
        <p>
          As the inclusion of structured knowledge (e.g.,
Knowledge Graphs) in deep learning, improves information
retrieval
          <xref ref-type="bibr" rid="ref70">(Sheth and Kapanipathi 2016)</xref>
          , prior research has
shown the significance of knowledge in the pursuit of
improving NLMs, such as in commonsense reasoning
          <xref ref-type="bibr" rid="ref46">(Liu and
Singh 2004)</xref>
          . The transformer NLMs such as BERT,
          <xref ref-type="bibr" rid="ref18">(Devlin et al. 2018)</xref>
          (including its variants BioBert and
SciBERT), are still data dependent. BERT has been utilized in
hybrid frameworks such as
          <xref ref-type="bibr" rid="ref68">(Scarlini, Pasini, and Navigli
2020)</xref>
          in the creation of sense embeddings using BabelNet
and NASARI.
          <xref ref-type="bibr" rid="ref23 ref47 ref48 ref58 ref73">(Liu et al. 2019a)</xref>
          proposed K-BERT, that
enriches the representations by injecting the triples from KGs
into the sentence. As this incorporation of knowledge for
BERT takes place in the form of attention, we consider the
K-BERT as semi-deep infusion
          <xref ref-type="bibr" rid="ref73">(Sheth et al. 2019)</xref>
          .
Similarly, ERNIE
          <xref ref-type="bibr" rid="ref76">(Sun et al. 2019)</xref>
          incorporated external
knowledge to capture lexical, syntactic, and semantic information,
enriching BERT.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Neural Attention Models (NAM)</title>
        <p>
          NAM
          <xref ref-type="bibr" rid="ref66">(Rush, Chopra, and Weston 2015)</xref>
          highlights
particular features that are important for pattern
recognition/classification based on a hierarchical architecture. The
manipulation of attentional focus is effective in solving real
world problems involving massive amounts of data
          <xref ref-type="bibr" rid="ref28 ref54 ref75">(Halevy,
Norvig, and Pereira 2009; Sun et al. 2017)</xref>
          . On the other
hand, some applications demonstrate the limitation of
attentional manipulation in a set of problems such as
sentiment (mis)classification
          <xref ref-type="bibr" rid="ref52">(Maurya 2018)</xref>
          and suicide risk
          <xref ref-type="bibr" rid="ref16">(Corbitt-Hall et al. 2016)</xref>
          , where feature presence is
inherently ambiguous, just as in the online radicalization
problem
          <xref ref-type="bibr" rid="ref23 ref41 ref42 ref58 ref73">(Kursuncu et al. 2019a)</xref>
          . For example, in the suicide
risk prediction task, references to suicide-related
terminology appear in social media posts of both victims as well
as supportive listeners, and the existing NAMs fail to
capture semantic relations between terms that help
differentiate the suicidal user from a supportive user
          <xref ref-type="bibr" rid="ref23 ref41 ref42 ref73">(Gaur et al.
2019)</xref>
          . To overcome such limitations in a sentiment
classification task, (Vo et al. 2017) adds sentiment scores into
the feature set for enhancing the learned representation and
modifies the loss function to respond to values of the
sentiment score during learning. However,
          <xref ref-type="bibr" rid="ref38 ref72">(Sheth et al. 2017;
Kho et al. 2019)</xref>
          have pointed out the importance of using
domain-specific knowledge especially in cases where the
problem is complex in nature
          <xref ref-type="bibr" rid="ref62">(Perera et al. 2016)</xref>
          .
          <xref ref-type="bibr" rid="ref24 ref49 ref5">(Bian,
Gao, and Liu 2014)</xref>
          has empirically demonstrated the
effectiveness of combining richer semantics from domain
knowledge with morphological and syntactic knowledge in the
text, by modeling knowledge as an auxiliary task that
regularizes the learning of the main objective in a deep neural
network.
        </p>
        <p>
          Knowledge-based Neural Networks
          <xref ref-type="bibr" rid="ref84">(Yi et al. 2018)</xref>
          introduced a knowledge-based, recurrent
attention neural network (KB-RANN) that modifies the
attentional mechanism by incorporating domain knowledge
to improve model generalization. However, their
domainknowledge is statistically derivable from the input data itself
and is analogous to merely learning an interpolation function
over the existing data.
          <xref ref-type="bibr" rid="ref20">(Dugas et al. 2009)</xref>
          proposed a
modification in the neural network by adopting Lipschitz functions
for its activation function.
          <xref ref-type="bibr" rid="ref33">(Hu et al. 2016)</xref>
          proposed a
combination of deep neural networks with logic rules by
employing knowledge distillation procedure
          <xref ref-type="bibr" rid="ref30">(Hinton, Vinyals, and
Dean 2015)</xref>
          of transferring the learned tacit knowledge from
larger neural network, to the weights of the smaller neural
network in data-limited settings. These studies for
incorporating knowledge in a deep learning framework have not
involved declarative knowledge structures in the form of KGs
(e.g., DBpedia)
          <xref ref-type="bibr" rid="ref13">(Chen et al. 2019)</xref>
          . However, (Casteleiro et
al. 2018) recently showed how the Cardiovascular Disease
Ontology (CDO) provided context and reduced ambiguity,
improving performance on a synonym detection task.
          <xref ref-type="bibr" rid="ref69">(Shen
et al. 2018)</xref>
          employed embeddings of entities in a KG,
derived through Bi-LSTMs, to enhance the efficacy of NAMs.
          <xref ref-type="bibr" rid="ref67">(Sarker et al. 2017)</xref>
          presented a conceptual framework for
explaining artificial neural networks’ classification behavior
using background knowledge on the semantic web. (Makni
and Hendler ) explained a deep learning approach to learn
RDFS2 rules from both synthetic and real-world
semantic web data. They also claim their approach improves the
noise-tolerance capabilities of RDFS reasoning.
        </p>
        <p>
          All of the frameworks in the above subsections utilized
external knowledge before or after the representation has
been generated by NAMs, rather than within the deep neural
network as in our approach
          <xref ref-type="bibr" rid="ref73">(Sheth et al. 2019)</xref>
          . We propose
a learning framework that infuses domain knowledge within
the latent layers of neural networks for modeling.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Preliminaries</title>
      <p>Symbolic representation of a domain, besides its
probabilistic representation, is crucial for neuro-symbolic learning. In
our approach, we propose to homogenize symbolic
information from KGs (see Section Knowledge Graphs) and
contextual neural representations (see Section Contextual
Modeling), in neural networks.</p>
      <sec id="sec-3-1">
        <title>2https://www.w3.org/2001/sw/wiki/RDFS</title>
        <p>
          A Knowledge graph (KG) is a conceptual model of a
domain that stores and structures declarative knowledge in a
human and machine-readable format, constituting factual
ground truth and embodying a domain ontology of objects,
attributes, and relations. KGs rely on symbolic propositions,
employing generic conceptual relationships in taxonomies,
partonomies and specific content with labeled links.
Examples include DBpedia, UMLS, and ICD-10. The factual
information about the domain is represented in the form of
instances (or individuals) of those concepts (or classes) and
relationships
          <xref ref-type="bibr" rid="ref27 ref71">(Gruber 2008; Sheth and Thirunarayan 2012)</xref>
          .
Therefore, a domain can be described or modeled through
KGs in a way that both computers and humans can
understand. As KGs differentiate contextual nuances of concepts
in the content, they play a key role in our framework with
extensive use by several functions.
        </p>
        <sec id="sec-3-1-1">
          <title>Contextual Modeling</title>
          <p>
            Capturing contextual cues in the language is crucial in our
approach; hence, we utilize NLMs to generate embeddings
of the content. Recent embedding algorithms have emerged
to create such representations such as Word2Vec
            <xref ref-type="bibr" rid="ref24">(Goldberg
and Levy 2014)</xref>
            , GLoVe
            <xref ref-type="bibr" rid="ref24 ref61">(Pennington, Socher, and Manning
2014)</xref>
            , FastText (Athiwaratkun, Wilson, and Anandkumar
            <xref ref-type="bibr" rid="ref1 ref74">2018) and BERT (Devlin et al. 2018</xref>
            ).
          </p>
          <p>
            Modeling context-sensitive problems in different domains
(e.g., healthcare, cyber social threats, online extremism and
harassment), depends heavily on carefully designed features
to extract meaningful information, based on characteristics
of the problems and a ground truth dataset. Moreover,
identifying these characteristics and differentiating the content
requires different levels of granularity in the organization of
features. For instance, in the problem of online Islamist
extremism, the information being shared in social media posts
by users in extremism-related social networks displays an
intent that depends on the user’s type (e.g., recruiter,
follower). Hence, as these user types show different
characteristics
            <xref ref-type="bibr" rid="ref22 ref40 ref44">(Kursuncu et al. 2018)</xref>
            , for reliable analysis, it is
critical to consider different contextual dimensions
            <xref ref-type="bibr" rid="ref22 ref23 ref40 ref41 ref42 ref44 ref58 ref73">(Kursuncu et
al. 2019a; Kursuncu 2018)</xref>
            . Moreover, the ambiguity of
diagnostic terms (e.g., jihad) also mandates representation of
terms in different contexts. Hence, to better reflect these
differences, creating multiple models enables us to represent
the multiple contextual dimensions for a reliable analysis.
Figure 1 details the contextual dimension modeling
workflow.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>A Proposed Comprehensive Approach</title>
      <p>
        Although the existing research (Gaur et al.
        <xref ref-type="bibr" rid="ref1 ref74">2018; Bhatt et
al. 2018</xref>
        a) shows the contribution of incorporating external
knowledge in ML, this incorporation mostly takes place
before or after the actual learning process (e.g., feature
extraction, validation); thus remaining shallow. We believe that
deep knowledge infusion, within the hidden layers of
neural networks, will greatly improve the performance by: (i)
reducing false alarm and information loss, (ii) boosting
recall without sacrificing precision, (iii) providing finer
granular representation, (iv) enabling explainability
        <xref ref-type="bibr" rid="ref35 ref41 ref42 ref58 ref65">(Islam et al.
2019; Kursuncu et al. 2019c)</xref>
        and (v) reducing bias.
Specifically, we believe that it will become a critical and integral
component of AI models that are integrated in deployed
tools, e.g, in healthcare, where domain knowledge is
crucial and indispensable in decision making processes.
Fortunately, these domains are rich in terms of their
respective machine-readable knowledge resources, such as
manually curated medical KGs (e.g., UMLS
        <xref ref-type="bibr" rid="ref53 ref54">(McInnes, Pedersen,
and Pakhomov 2009)</xref>
        , ICD-10
        <xref ref-type="bibr" rid="ref9">(Brouch 2000)</xref>
        and DataMed
        <xref ref-type="bibr" rid="ref56">(Ohno-Machado et al. 2017)</xref>
        ). In our prior research
        <xref ref-type="bibr" rid="ref22 ref3 ref4 ref40">(Gaur
et al. 2018)</xref>
        , we utilized ML models coupled with these
KGs to predict mental health disorders among 20 Mental
Disorders (defined in the DSM-5) for Reddit posts. Typical
approaches for such predictions employ word embeddings,
such as Word2Vec, resulting in sub-optimal performance
when they are used in domain-specific tasks. We have
incorporated knowledge into the embeddings of Reddit posts
by (i) using Zero Shot learning
        <xref ref-type="bibr" rid="ref60">(Palatucci et al. 2009)</xref>
        , (ii)
modulating (e.g., re-weighting) their embeddings, similar
to NAMs, and obtained a significant reduction in the false
alarm rate, from 13% (without knowledge) to 2.5% (with
knowledge). In another study, we have leveraged the domain
knowledge in KGs to validate model weights that explain
diverse crowd behavior in the Fantasy Premier League
participants (FPL)
        <xref ref-type="bibr" rid="ref3 ref4">(Bhatt et al. 2018b)</xref>
        . However, very little
previous work has tried to integrate such functional knowledge to
an existing deep learning framework.
      </p>
      <p>We propose to further develop an innovative deep
knowledge-infused learning approach that will reveal
patterns that are missed by traditional approaches because of
sparse feature occurrence, feature ambiguity and noise. This
approach will support the following integrated aims: (i)
Infusion of Declarative Domain Knowledge in a Deep
Learning framework, and (ii) Optimal Sub-Knowledge Graph
Creation and Evolution. The overall architecture in Figure
2 guides our proposed research on these two aims. Our
methods will disambiguate important concepts defined in
the respective KGs with their different semantic meanings
through its structural relations. Knowledge infusion will
redefine the emphasis of sparse-but-essential, and
irrelevantbut-frequently-occurring terms and concepts, boosting recall
without reducing precision.</p>
      <sec id="sec-4-1">
        <title>Knowledge-Infused Learning</title>
        <p>Each layer in a neural network architecture produces a
latent representation of the input vector (ht). The infusion of
knowledge during the representation learning phase raises
the following central research questions: R1: How do we
decide whether to infuse knowledge or not, at a particular
stage while learning between layers, and how to quantify
knowledge to be infused? R2: How to merge latent
representations between layers with external knowledge
representations, and R3: How to propagate the knowledge through
the learned latent representation? We propose to define two
functions to address these two questions: Knowledge-Aware
Loss Function (K-LF) and Knowledge Modulation Function
(K-MF), respectively.</p>
        <p>Configurations of neural networks can be designed in
various ways depending on the problem. As our aim is to
infuse knowledge within the neural network, such an
operation can take place (i) before the output layer (e.g.,
SoftMax), (ii) between hidden layers (e.g., reinforcing the gates
of an NLM layer, modulating the hidden states of NLM
layers, Knowledge-driven NLM dropout and recurrent dropout
between layers). To illustrate (i), we describe our initial
approach to neural language models that infuses knowledge
before the output layer, which we believe will shed the light
towards a reliable and robust solution with more research
and rigorous experimentations.</p>
      </sec>
      <sec id="sec-4-2">
        <title>Seeded Sub-Knowledge Graph The Seeded Sub</title>
        <p>
          Knowledge Graph, is a subset of KGs, which participate
broadly in our technical approach. Generic KGs (e.g.,
DBpedia
          <xref ref-type="bibr" rid="ref6">(Bizer et al. 2009)</xref>
          , YAGO2 (Hoffart et al. 2013),
Freebase
          <xref ref-type="bibr" rid="ref7">(Bollacker et al. 2008)</xref>
          ) may contain over a
million entities and close to a billion relationships. Using
the entire graph of linked data on the web can cause; (1)
unnecessary computation and (2) noise due to irrelevant
knowledge, and has sometimes failed to benefit
intelligent application
          <xref ref-type="bibr" rid="ref21 ref63 ref82">(Roy, Park, and Pan 2017)</xref>
          . However,
real-world problems are domain-specific and require only
a relevant (sub) portion of the full graph. Creation of a
Seeded Sub-KG
          <xref ref-type="bibr" rid="ref45">(Lalithsena 2018)</xref>
          based on a ground truth
dataset is needed, to represent a particular domain using
information-theoretic approaches (e.g., KL divergence)
and probabilistic soft logic
          <xref ref-type="bibr" rid="ref39">(Kimmig et al. 2012)</xref>
          . Further,
a sub-graph discovery approach
          <xref ref-type="bibr" rid="ref10 ref45">(Cameron et al. 2015;
Lalithsena 2018)</xref>
          can also be used utilizing probabilistic
graphical models (e.g., deep belief networks, conditional
random fields). In our approach, the Seeded SubKG will be
updated with more knowledge based on difference between
the learned representation and relevant knowledge
representation from the KG (see Section Differential Knowledge
Engine).
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>Ke: Knowledge Embedding Creation Representation of</title>
        <p>knowledge in the Seeded SubKG will be generated as
embedding vectors. Specific contextual dimension models
and/or more generic models can be utilized to create an
embedding of each concept and their relations in the Seeded
SubKG. Unlike traditional approaches that compute the
representation of each concept in the KGs by simply taking an
average of embedding vectors of concepts, we leverage the
existing structural information of the graph. This procedure
is formally defined:</p>
        <p>Ke =</p>
        <p>X[Ci; Cj ] O</p>
        <p>
          Dij
(1)
ij
where Ke is the representation of the concepts enriched
by the relationships in the Seeded-KG, (Ci, Cj ) is the
relevant pair of concepts in the Seeded-KG, Dij is the distance
measure (e.g., Least Common Subsumer
          <xref ref-type="bibr" rid="ref2">(Baader, Sertkaya,
and Turhan 2007)</xref>
          ) between the two concepts Ci and Cj .
Novel methods will further be examined building upon this
initial approach above as well as existing tools that include
TRANS-E
          <xref ref-type="bibr" rid="ref8">(Bordes et al. 2013)</xref>
          , TRANS-H
          <xref ref-type="bibr" rid="ref83">(Wang et al.
2014)</xref>
          , and HOLE
          <xref ref-type="bibr" rid="ref55">(Nickel et al. 2016)</xref>
          for the creation of
embeddings from KGs.
        </p>
      </sec>
      <sec id="sec-4-4">
        <title>Knowledge Infusion Layer In a many-to-one NLM</title>
        <p>
          (Shivakumar et al. 2018) network with T hidden layers,
the Tth layer contains the learned representation before the
Algorithm 1 takes the type of neural language model,
number of epochs, iterations and the seeded knowledge
graph embedding Ke as input, and returns a knowledge
infused representation of the hidden state MT. In line 4, the
infusion of knowledge takes place after each epoch without
obstructing the learning of the vanilla NLM model and is
explained in lines 5-10. Within the knowledge infusion process
(lines 7-9), we optimize the loss function in equation 2 with
convergence condition defined as the reduction in the
difference between the DKL of hT and hT 1 in the presence
of Ke. Considering the vanilla structure of a NLM
          <xref ref-type="bibr" rid="ref25">(Greff et
al. 2017)</xref>
          , MT is utilized by the fully connected layer for
classification.
        </p>
        <p>To illustrate an initial approach in Figure 3, we use
LSTMs as NLMs in our neural network. K-IL functions add
an additional layer before the output layer of our proposed
output layer. The output layer (e.g., SoftMax) of the NLM
model estimates the error to be back-propagated. As the
techniques for knowledge infusion between hidden layers or
just before the output layers will be explored, in this
subsection, we explain the Knowledge Infusion Layer (K-IL)
which takes place just before the output layer.</p>
        <p>Algorithm 1 Routine for Infusion of Knowledge in NLMs
1: procedure KNOWLEDGEINFUSION
2: Data : N LMtype; #Epochs; #Iter; Ke
3: Output : M!T
4: for ne=1 to #Epochs do
5: h!T , hT !1
6:</p>
        <p>DKL(h!T jjK!e) &gt; ) do
neural network architecture. This layer takes the latent
vector (hT 1) of the penultimate layer, the latent vector of the
last hidden layer (hT) and the knowledge embedding (Ke),
as input.</p>
        <p>In this layer, we define two particular functions that will
be critical for merging the latent vectors from the hidden
layers and the knowledge embedding vector from the KG.
Note that the dimensions of these vectors are the same
because they are created from the same models (e.g.,
contextual models), which makes the merge operation of those
vectors possible and valid.</p>
        <p>
          K-LF: Knowledge-Aware Loss Function In neural
networks, hidden layers may de-emphasize important patterns
due to the sparsity of certain features during learning, which
causes information loss. In some cases, such patterns may
not even appear in the data. However, such relations or
patterns may be defined in KGs with even more relevant
knowledge. We call this information gap between the learned
representation of the data and knowledge representation as
differential knowledge. Information loss in a learning process
is relative to the distribution that suffered the loss. Hence, we
propose a measure to determine the differential knowledge
and guide the degree of knowledge infusion in learning. As
our initial approach to this measure, we developed a
twostate regularized loss function by utilizing Kullback Leibler
(KL) divergence. Our choice of KL divergence measure is
largely influenced by the Markov assumptions made in
language modeling and have been highlighted in
          <xref ref-type="bibr" rid="ref50">(Longworth
2010)</xref>
          . The K-LF measure estimates the divergence between
the hidden representations (hT 1; hT) and knowledge
representation (Ke), to determine the differential knowledge to
be infused.
        </p>
        <p>Formally we define it as:</p>
        <sec id="sec-4-4-1">
          <title>K-LF, where hT 1 is an input</title>
          <p>arg min(hT~ 1; h~T ; K~e)
for convergence constraint.</p>
          <p>K-LF = min DKL(h~T jjK~e);
s:t:DKL(h~T jjK~e) &lt; DKL(hT~ 1jjK~e)
(2)
We minimize the relative entropy for information loss to
maximize the information gain from the knowledge
representation (e.g., Ke). We will compute differential
knowledge (rK-LF) through such optimization approach; thus,
the computed differential knowledge will also determine the
degree of knowledge to be infused. rK-LF will be
computed in the form of embedding vectors, and the dimensions
from Ke will be preserved.</p>
          <p>
            K-MF: Knowledge Modulation Function We need to
merge the differential knowledge representation with the
partially learned representation. However, this operation
cannot be done arbitrarily as the vector spaces of both
representations are different both in dimension and distribution if
not same
            <xref ref-type="bibr" rid="ref21 ref82">(Dumancˇic´ and Blockeel 2017)</xref>
            . We explain an
initial approach for the K-MF to modulate the learned weight
matrix of the neural network with the hidden vector through
an appropriate operation (e.g., Hadamard pointwise
multiplication). This operation at the Tth layer can be formulated
as:
          </p>
          <p>
            Equation for W hk = W hk k rK-LF, where W hk is
the learned weight matrix infusing knowledge, k is
learning momentum
            <xref ref-type="bibr" rid="ref77">(Sutskever et al. 2013)</xref>
            , rK-LF is
differential knowledge. The weight matrix (W hk) is computed
through the learning epochs utilizing the differential
knowledge embedding (rK-LF). Then we merge W hk with the
hidden vector hT through the K-MF. Considering that we
use Hadamard pointwise multiplication as our initial
approach, we formally define the output MT of K-MF as:
This operation at the Tth layer can be formulated as:
MT = h~T
~
          </p>
          <p>W hk
(3)
where MT is Knowledge-Modulated representation, hT
is the hidden vector and W hk is the learned weight matrix
infusing knowledge. Further investigations of techniques for
K-MF constitutes a central research topic for the research
community.</p>
        </sec>
      </sec>
      <sec id="sec-4-5">
        <title>Differential Knowledge Engine</title>
        <p>In deep neural networks, each epoch generates an error that
is back-propagated until the model reaches a saddle point
in the local minima, and the error is reduced in each epoch.
The error indicates the difference between probabilities of
actual and predicted labels, and this difference can be used
to enrich the Seeded SubKG in our proposed
knowledgeinfused learning (K-IL) framework.</p>
        <p>In this subsection, we discuss the sub-knowledge graph
operations that are based on the difference between the
learned representation of our knowledge-infused model
(MT), and the representation of the relevant sub-knowledge
graph from the KG, which we name the differential
subknowledge graph. We define a Knowledge Proximity
function to generate the Differential Sub-knowledge Graph,
and Update Seeded SubKG to insert the differential
subknowledge graph into the Seeded SubKG.</p>
        <p>Knowledge Proximity Upon the arrival of the learned
representation from the knowledge-infused learning model,
we query the KG for retrieving related information to the
respective data point. In this particular step, it is important
to find the optimal proximity between the concept and its
related concepts. For example, from the “South Carolina”
concept, we may traverse the surrounding concepts with a
varying number of hops (empirically decided). Finding the
optimal number of hops towards each direction from the
concept in question is still an open research question. As
we find optimal proximity of a particular concept in the KG,
we propagate KG based on the proximity starting from the
concept in question.</p>
        <p>Differential SubKG Once we obtain the SubKG from the
graph propagation, we create a differential SubKG that will
reflect the difference in knowledge from the Seeded SubKG.
For this procedure, research is needed to formulate the
problem using variational autoencoders to extract a differential
subKG(Dkg) and, we believe it will provide missing
information in the Seeded-KG.</p>
        <p>
          Update function The differential subKG generated as a
result of minimizing knowledge proximity is considered as
an input factual graph to the update procedure. As a
result, the procedure dynamically evolves the Seeded subKG
with missing information from differential subKG. We
propose to utilize Lyapunov stability theorem
          <xref ref-type="bibr" rid="ref24 ref49 ref83">(Liu, Zhang, and
Chen 2014)</xref>
          and Zero Shot learning to update the
SeededKG using Dkg. Dkg and Seeded-KG represent two
knowledge structures requiring a process of transferring the
knowledge from one structure to another (Hamaguchi et al. 2017).
We define this process as generation of semantic mapping
weights that encodes and decodes the two semantic spaces,
utilizing the Lyapunov stability constraint and Sylvester
optimization approach: Given two semantic spaces belonging
to a domain D (e.g., online extremism, mental health), we
tend to attain an equilibrium position defined as:
jjSkg
        </p>
        <p>W</p>
        <p>Dkgjj2F =
jjW</p>
        <p>Skg</p>
        <p>2
DkgjjF
(4)
jj : jjF represents Frobenius norm and is a
proportionality constant belong to R. Equation 4 reflects Lyapunov
stability theorem and to achieve such a stable state we define
our optimization function as follows:</p>
        <p>L = min(jjSkg</p>
        <p>2
W DkgjjF
jjW Skg</p>
        <p>Dkgjj2F );
&gt; 0; W 2 RXR
(5)</p>
        <p>
          Equation 5 is solvable using Sylvester optimization and
its derivation is defined in a recent study
          <xref ref-type="bibr" rid="ref22 ref3 ref4 ref40">(Gaur et al. 2018)</xref>
          .
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Applications for K-IL</title>
      <p>Artificial intelligence models will be widely deployed in real
world decision making processes in the foreseeable future,
once the challenges described in Section 1, are overcome. As
we argue that the incorporation of external structured
knowledge will address these challenges, it will benefit various
application domains such as social and health sciences,
automating processes that require knowledge and intelligence.
Specifically, it will have a potentially significant impact on
predictive analysis of online communications such as
misinformation and extremism, conversational modeling, and
disease prediction.</p>
      <p>
        As predicting online extremism is challenging and false
alarms create serious implications potentially affecting
millions of individuals,
        <xref ref-type="bibr" rid="ref23 ref41 ref42 ref58 ref73">(Kursuncu et al. 2019a)</xref>
        showcased that
the (shallow) infusion of external domain-specific
knowledge improves precision, reducing potential social
discrimination. Further, in prediction of mental health diseases
defined in DSM-5,
        <xref ref-type="bibr" rid="ref22 ref3 ref4 ref40">(Gaur et al. 2018)</xref>
        shallow knowledge
infusion reduces false alarms by 30%. On the other hand,
conversational models pose an important application area as
        <xref ref-type="bibr" rid="ref13 ref17 ref47 ref48">(Liu
et al. 2019b)</xref>
        proposed a conversation framework where the
fusion of KGs and text mutually reinforce each other to
generate knowledge-aware responses, improving the model in
generalizability and explainability. In another study,
        <xref ref-type="bibr" rid="ref86">(Young
et al. 2018)</xref>
        integrated commonsense knowledge into the
conversational models selecting the most appropriate
response. While machine learning finds many application
areas in medicine for disease prediction, large data is not
always available. In this case knowledge-infused learning
generates more representative features thereby avoiding
overfitting. A study
        <xref ref-type="bibr" rid="ref79">(Tan et al. 2019)</xref>
        on early diagnosis of lung
cancer using computed tomography images, infused
knowledge in the form of expert-curated features into the
learning process through CNN. Despite the small data set, the
enriched feature space in their knowledge-infused learning
process improved sensitivity and specificity of the model.
      </p>
      <p>In contrast to the applications above, we believe that the
deep infusion of external knowledge within latent layers will
enhance the coverage of the information being learned by
the model based on KGs. Hence, this will provide better
generalizability, reduction in bias and false alarms,
disambiguation, less reliance on large data, explainability,
reliability and robustness, to the real world applications in critical
aforementioned domains with significant impact.</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>Combining deep learning and knowledge graphs in a hybrid
neural-symbolic learning framework will further enhance
performance and accelerate the convergence of the
learning processes. Specifically, the impact of this improvement
in very sensitive domains such as health and social science,
will be significant with respect to their implications for
realworld deployment. Adoption of the tools that automate tasks
that require knowledge and intelligence, and are traditionally
done by humans, will improve with the help of this
framework that marries deep learning and knowledge graph
techniques. Specifically, we envision that the infusion of
knowledge as described in this framework will capture information
for the corresponding domain in finer granularity of
abstraction. We believe that this approach will provide reliable
solutions to the problems faced in deep learning, as described
in Sections 1 and 5. Hence, in real world applications,
resolving these issues with both knowledge graphs and deep
learning in a hybrid neuro-symbolic framework will greatly
contribute to fulfilling AI’s promise.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgement</title>
      <p>We acknowledge partial support from the National Science
Foundation (NSF) award CNS-1513721: “Context-Aware
Harassment Detection on Social Media". Any opinions,
conclusions or recommendations expressed in this material are
those of the authors and do not necessarily reflect the views
of the NSF.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          2018.
          <article-title>Probabilistic fasttext for multi-sense word embeddings</article-title>
          . arXiv preprint arXiv:
          <year>1806</year>
          .02901.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Baader</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Sertkaya</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Turhan</surname>
          </string-name>
          , A.-Y.
          <year>2007</year>
          .
          <article-title>Computing the least common subsumer wrt a background terminology</article-title>
          .
          <source>Journal of Applied Logic</source>
          <volume>5</volume>
          (
          <issue>3</issue>
          ):
          <fpage>392</fpage>
          -
          <lpage>420</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Bhatt</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; Gaur,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Bullemer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ;
            <surname>Shalin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            ;
            <surname>Sheth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ; and
            <surname>Minnery</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.</surname>
          </string-name>
          <year>2018a</year>
          .
          <article-title>Enhancing crowd wisdom using explainable diversity inferred from social media</article-title>
          .
          <source>In 2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI)</source>
          ,
          <fpage>293</fpage>
          -
          <lpage>300</lpage>
          . IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Bhatt</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; Gaur,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Bullemer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ;
            <surname>Shalin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. L.</given-names>
            ;
            <surname>Sheth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. P.</given-names>
            ; and
            <surname>Minnery</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.</surname>
          </string-name>
          <year>2018b</year>
          .
          <article-title>Enhancing crowd wisdom using explainable diversity inferred from social media</article-title>
          .
          <source>In IEEE/WIC/ACM International Conference on Web Intelligence. Santiago</source>
          , Chile: IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Bian</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ; and Liu, T.-Y.
          <year>2014</year>
          .
          <article-title>Knowledge-powered deep learning for word embedding</article-title>
          .
          <source>In Joint European conference on machine learning and knowledge discovery in databases</source>
          ,
          <volume>132</volume>
          -
          <fpage>148</fpage>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Kobilarov,
          <string-name>
            <surname>G.</surname>
          </string-name>
          ; Auer,
          <string-name>
            <given-names>S.</given-names>
            ;
            <surname>Becker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ;
            <surname>Cyganiak</surname>
          </string-name>
          , R.; and
          <string-name>
            <surname>Hellmann</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2009</year>
          .
          <article-title>Dbpedia-a crystallization point for the web of data</article-title>
          .
          <source>Web Semantics: science, services and agents on the world wide web 7</source>
          (
          <issue>3</issue>
          ):
          <fpage>154</fpage>
          -
          <lpage>165</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Bollacker</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Evans</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Paritosh</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ; Sturge,
          <string-name>
            <surname>T.</surname>
          </string-name>
          ; and Taylor, J.
          <year>2008</year>
          .
          <article-title>Freebase: a collaboratively created graph database for structuring human knowledge</article-title>
          .
          <source>In Proceedings of the 2008 ACM SIGMOD international conference on Management of data</source>
          ,
          <fpage>1247</fpage>
          -
          <lpage>1250</lpage>
          . AcM.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Bordes</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Usunier</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Garcia-Duran</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Weston</surname>
          </string-name>
          , J.; and
          <string-name>
            <surname>Yakhnenko</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <year>2013</year>
          .
          <article-title>Translating embeddings for modeling multi-relational data</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          ,
          <volume>2787</volume>
          -
          <fpage>2795</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Brouch</surname>
            ,
            <given-names>K. L.</given-names>
          </string-name>
          <year>2000</year>
          . Where in the world is icd-
          <volume>10</volume>
          ? Where in the World Is ICD-
          <volume>10</volume>
          ?/AHIMA, American Health Information Management Association.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Cameron</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ; Kavuluru,
          <string-name>
            <surname>R.</surname>
          </string-name>
          ; Rindflesch,
          <string-name>
            <given-names>T. C.</given-names>
            ;
            <surname>Sheth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. P.</given-names>
            ;
            <surname>Thirunarayan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ; and
            <surname>Bodenreider</surname>
          </string-name>
          ,
          <string-name>
            <surname>O.</surname>
          </string-name>
          <year>2015</year>
          .
          <article-title>Context-driven automatic subgraph creation for literature-based discovery</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <source>Journal of biomedical informatics</source>
          <volume>54</volume>
          :
          <fpage>141</fpage>
          -
          <lpage>157</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>J. F.</given-names>
            ;
            <surname>Maroto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ;
            <surname>Fernandez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. M.</given-names>
            ;
            <surname>Nenadic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ;
            <surname>Klein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ;
            <surname>Keane</surname>
          </string-name>
          , J.; and Stevens,
          <string-name>
            <surname>R.</surname>
          </string-name>
          <year>2018</year>
          .
          <article-title>Deep learning meets ontologies: experiments to anchor the cardiovascular disease ontology in the biomedical literature</article-title>
          .
          <source>Journal of biomedical semantics 9</source>
          (
          <issue>1</issue>
          ):
          <fpage>13</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Hao</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Cai</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Cai</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ; Wen,
          <string-name>
            <surname>W.</surname>
          </string-name>
          ; Zhu, J.; and Xie,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <year>2019</year>
          .
          <article-title>Embedding logic rules into recurrent neural networks</article-title>
          .
          <source>IEEE Access</source>
          <volume>7</volume>
          :
          <fpage>14938</fpage>
          -
          <lpage>14946</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>Cheng</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Ai reasoning systems: Pac and applied methods</article-title>
          . arXiv preprint arXiv:
          <year>1807</year>
          .05054.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Cho</surname>
            ,
            <given-names>K.; Van</given-names>
          </string-name>
          <string-name>
            <surname>Merriënboer</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Gulcehre</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Bahdanau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Bougares</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Schwenk</surname>
            , H.; and Bengio,
            <given-names>Y.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Learning phrase representations using rnn encoder-decoder for statistical machine translation</article-title>
          .
          <source>arXiv preprint arXiv:1406</source>
          .
          <fpage>1078</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>Corbitt-Hall</surname>
            ,
            <given-names>D. J.</given-names>
          </string-name>
          ; Gauthier,
          <string-name>
            <given-names>J. M.</given-names>
            ;
            <surname>Davis</surname>
          </string-name>
          , M. T.; and Witte,
          <string-name>
            <surname>T. K.</surname>
          </string-name>
          <year>2016</year>
          .
          <article-title>College students' responses to suicidal content on social networking sites: an examination using a simulated facebook newsfeed</article-title>
          .
          <source>Suicide and Life-Threatening Behavior</source>
          <volume>46</volume>
          (
          <issue>5</issue>
          ):
          <fpage>609</fpage>
          -
          <lpage>624</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>De Palma</surname>
          </string-name>
          , G.;
          <string-name>
            <surname>Kiani</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Lloyd</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>Random deep neural networks are biased towards simple functions</article-title>
          .
          <source>In Advances in Neural Information Processing Systems</source>
          ,
          <fpage>1962</fpage>
          -
          <lpage>1974</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Chang, M.-W.;
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding</article-title>
          . arXiv preprint arXiv:
          <year>1810</year>
          .04805.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <surname>Dugas</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Bélisle</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Nadeau</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ; and Garcia,
          <string-name>
            <surname>R.</surname>
          </string-name>
          <year>2009</year>
          .
          <article-title>Incorporating functional knowledge in neural networks</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>10</volume>
          (Jun):
          <fpage>1239</fpage>
          -
          <lpage>1262</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>Dumancˇic´</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Blockeel</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>Demystifying relational latent representations</article-title>
          .
          <source>In International Conference on Inductive Logic Programming</source>
          ,
          <fpage>63</fpage>
          -
          <lpage>77</lpage>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <surname>Gaur</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Kursuncu</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Alambo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Sheth</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Daniulaityte</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ; Thirunarayan,
          <string-name>
            <given-names>K.</given-names>
            ; and
            <surname>Pathak</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <year>2018</year>
          .
          <article-title>" let me tell you about your mental health!" contextualized classification of reddit posts to dsm-5 for web-based intervention</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <surname>Gaur</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Alambo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Sain</surname>
            ,
            <given-names>J. P.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Kursuncu</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Thirunarayan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Kavuluru</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Sheth</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Welton</surname>
          </string-name>
          , R.; and
          <string-name>
            <surname>Pathak</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>Knowledge-aware assessment of severity of suicide risk for early intervention</article-title>
          .
          <source>In The World Wide Web Conference</source>
          ,
          <volume>514</volume>
          -
          <fpage>525</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <surname>Goldberg</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Levy</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>word2vec explained: deriving mikolov et al.'s negative-sampling word-embedding method</article-title>
          .
          <source>arXiv preprint arXiv:1402</source>
          .
          <fpage>3722</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <surname>Greff</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Srivastava</surname>
            ,
            <given-names>R. K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Koutník</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Steunebrink</surname>
            ,
            <given-names>B. R.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Schmidhuber</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>Lstm: A search space odyssey</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <source>IEEE transactions on neural networks and learning systems 28(10)</source>
          :
          <fpage>2222</fpage>
          -
          <lpage>2232</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <surname>Gruber</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <year>2008</year>
          .
          <article-title>Ontology, encyclopedia of database systems, ling liu and m</article-title>
          . Tamer Özsu (Eds.).
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <string-name>
            <surname>Halevy</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Norvig</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Pereira</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <year>2009</year>
          .
          <article-title>The unreasonable effectiveness of data</article-title>
          .
          <source>IEEE Intelligent Systems</source>
          <volume>24</volume>
          (
          <issue>2</issue>
          ):
          <fpage>8</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          2017.
          <article-title>Knowledge transfer for out-of-knowledge-base entities: a graph neural network approach</article-title>
          .
          <source>arXiv preprint arXiv:1706</source>
          .
          <fpage>05674</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <string-name>
            <surname>Hinton</surname>
          </string-name>
          , G.;
          <string-name>
            <surname>Vinyals</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>Distilling the knowledge in a neural network</article-title>
          .
          <source>arXiv preprint arXiv:1503</source>
          .
          <fpage>02531</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <string-name>
            <surname>Hochreiter</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Schmidhuber</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>1997</year>
          .
          <article-title>Long short-term memory</article-title>
          .
          <source>Neural computation</source>
          <volume>9</volume>
          (
          <issue>8</issue>
          ):
          <fpage>1735</fpage>
          -
          <lpage>1780</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          2013.
          <article-title>Yago2: A spatially and temporally enhanced knowledge base from wikipedia</article-title>
          .
          <source>Artificial Intelligence</source>
          <volume>194</volume>
          :
          <fpage>28</fpage>
          -
          <lpage>61</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ; Ma,
          <string-name>
            <given-names>X.</given-names>
            ;
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            ;
            <surname>Hovy</surname>
          </string-name>
          , E.; and
          <string-name>
            <surname>Xing</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          <article-title>Harnessing deep neural networks with logic rules</article-title>
          .
          <source>arXiv preprint arXiv:1603</source>
          .
          <fpage>06318</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          <string-name>
            <surname>Islam</surname>
            ,
            <given-names>S. R.</given-names>
          </string-name>
          ; Eberle,
          <string-name>
            <surname>W.</surname>
          </string-name>
          ; Bundy,
          <string-name>
            <given-names>S.</given-names>
            ; and
            <surname>Ghafoor</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. K.</surname>
          </string-name>
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          <article-title>Infusing domain knowledge in ai-based" black box" models for better explainability with application in bankruptcy prediction</article-title>
          . arXiv preprint arXiv:
          <year>1905</year>
          .11474.
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          <string-name>
            <surname>Karpathy</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>The unreasonable effectiveness of recurrent neural networks</article-title>
          .
          <source>Andrej Karpathy blog 21.</source>
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          <string-name>
            <surname>Kho</surname>
            ,
            <given-names>S. J.</given-names>
          </string-name>
          ; Padhee,
          <string-name>
            <surname>S.</surname>
          </string-name>
          ; Bajaj,
          <string-name>
            <given-names>G.</given-names>
            ;
            <surname>Thirunarayan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ; and
            <surname>Sheth</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <year>2019</year>
          .
          <article-title>Domain-specific use cases for knowledgeenabled social media analysis</article-title>
          .
          <source>In Emerging Research Challenges and Opportunities in Computational Social Network Analysis and Mining</source>
          . Springer.
          <fpage>233</fpage>
          -
          <lpage>246</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          <string-name>
            <surname>Kimmig</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Bach</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; Broecheler,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ; and
            <surname>Getoor</surname>
          </string-name>
          ,
          <string-name>
            <surname>L.</surname>
          </string-name>
          <year>2012</year>
          .
          <article-title>A short introduction to probabilistic soft logic</article-title>
          .
          <source>In Proceedings of the NIPS Workshop on Probabilistic Programming: Foundations and Applications</source>
          , 1-
          <fpage>4</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          <string-name>
            <surname>Kursuncu</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Gaur</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Lokala</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Illendula</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Thirunarayan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Daniulaityte</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Sheth</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Arpinar</surname>
            ,
            <given-names>I. B.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>" what's ur type?" contextualized classification of user types in marijuana-related communications using compositional multiview embedding</article-title>
          .
          <source>In IEEE/WIC/ACM International Conference on Web Intelligence(WI'18).</source>
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          <string-name>
            <surname>Kursuncu</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Gaur</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Castillo</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Alambo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Thirunarayan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Shalin</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Achilov</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Arpinar</surname>
            ,
            <given-names>I. B.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Sheth</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2019a</year>
          .
          <article-title>Modeling islamist extremist communications on social media using contextual dimensions: Religion, ideology, and hate</article-title>
          .
          <source>Proceedings of the ACM on Human-Computer Interaction 3</source>
          <volume>(CSCW)</volume>
          :
          <volume>151</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          <string-name>
            <surname>Kursuncu</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Gaur</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Lokala</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Thirunarayan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Sheth</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Arpinar</surname>
            ,
            <given-names>I. B.</given-names>
          </string-name>
          <year>2019b</year>
          .
          <article-title>Predictive analysis on twitter: Techniques and applications</article-title>
          .
          <source>In Emerging Research Challenges and Opportunities in Computational Social Network Analysis and Mining</source>
          . Springer.
          <fpage>67</fpage>
          -
          <lpage>104</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          2019c.
          <article-title>Explainability of medical ai through domain knowledge</article-title>
          .
          <source>Ontology Summit</source>
          <year>2019</year>
          ,
          <string-name>
            <given-names>Medical</given-names>
            <surname>Explanation</surname>
          </string-name>
          .
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          <string-name>
            <surname>Kursuncu</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Modeling the Persona in Persuasive Discourse on Social Media Using Context-aware and Knowledge-driven Learning</article-title>
          .
          <source>Ph.D. Dissertation</source>
          , University of Georgia.
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          <string-name>
            <surname>Lalithsena</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Domain-specific knowledge extraction from the web of data</article-title>
          .
          <source>Ph.D. Dissertation</source>
          , Wright State University.
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <year>2004</year>
          .
          <article-title>Commonsense reasoning in and over natural language</article-title>
          .
          <source>In International Conference on Knowledge-Based and Intelligent Information and Engineering Systems</source>
          ,
          <volume>293</volume>
          -
          <fpage>306</fpage>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ; Zhou,
          <string-name>
            <given-names>P.</given-names>
            ;
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            ;
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            ;
            <surname>Ju</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            ;
            <surname>Deng</surname>
          </string-name>
          , H.; and
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          2019a.
          <article-title>K-bert: Enabling language representation with knowledge graph</article-title>
          . arXiv preprint arXiv:
          <year>1909</year>
          .07606.
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Niu</surname>
            ,
            <given-names>Z</given-names>
          </string-name>
          .-Y.;
          <string-name>
            <surname>Wu</surname>
          </string-name>
          , H.; and
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <year>2019b</year>
          .
          <article-title>Knowledge aware conversation generation with explainable reasoning over augmented graphs</article-title>
          .
          <source>In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</source>
          ,
          <fpage>1782</fpage>
          -
          <lpage>1792</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ; Zhang, D.; and
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Attribute relation learning for zero-shot classification</article-title>
          .
          <source>Neurocomputing</source>
          <volume>139</volume>
          :
          <fpage>34</fpage>
          -
          <lpage>46</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref50">
        <mixed-citation>
          <string-name>
            <surname>Longworth</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <year>2010</year>
          .
          <article-title>Kernel methods for text-independent speaker verification</article-title>
          .
          <source>Ph.D. Dissertation</source>
          , University of Cambridge.
        </mixed-citation>
      </ref>
      <ref id="ref51">
        <mixed-citation>
          <string-name>
            <surname>Masse</surname>
            ,
            <given-names>N. Y.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Grant</surname>
            ,
            <given-names>G. D.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Freedman</surname>
            ,
            <given-names>D. J.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Alleviating catastrophic forgetting using context-dependent gating and synaptic stabilization</article-title>
          .
          <source>Proceedings of the National Academy of Sciences</source>
          <volume>115</volume>
          (
          <issue>44</issue>
          ):
          <fpage>E10467</fpage>
          -
          <lpage>E10475</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref52">
        <mixed-citation>
          <string-name>
            <surname>Maurya</surname>
            ,
            <given-names>A. K.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Learning low dimensional word based linear classifiers using data shared adaptive bootstrap aggregated lasso with application to imdb data</article-title>
          .
          <source>arXiv preprint arXiv:1807</source>
          .10623.
        </mixed-citation>
      </ref>
      <ref id="ref53">
        <mixed-citation>
          <string-name>
            <surname>McInnes</surname>
            ,
            <given-names>B. T.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Pedersen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Pakhomov</surname>
            ,
            <given-names>S. V.</given-names>
          </string-name>
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref54">
        <mixed-citation>
          <article-title>Umls-interface and umls-similarity: open source software for measuring paths and semantic similarity</article-title>
          .
          <source>In AMIA Annual Symposium Proceedings</source>
          , volume
          <year>2009</year>
          ,
          <volume>431</volume>
          . American Medical Informatics Association.
        </mixed-citation>
      </ref>
      <ref id="ref55">
        <mixed-citation>
          <string-name>
            <surname>Nickel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Rosasco</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Poggio</surname>
            ,
            <given-names>T. A.</given-names>
          </string-name>
          ; et al.
          <year>2016</year>
          .
          <article-title>Holographic embeddings of knowledge graphs</article-title>
          .
          <source>In AAAI</source>
          , volume
          <volume>2</volume>
          ,
          <fpage>3</fpage>
          -
          <lpage>2</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref56">
        <mixed-citation>
          <string-name>
            <surname>Ohno-Machado</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Sansone</surname>
          </string-name>
          , S.-A.;
          <string-name>
            <surname>Alter</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Fore</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ; Grethe, J.; Xu,
          <string-name>
            <given-names>H.</given-names>
            ;
            <surname>Gonzalez-Beltran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ;
            <surname>Rocca-Serra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ;
            <surname>Gururaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. E.</given-names>
            ;
            <surname>Bell</surname>
          </string-name>
          ,
          <string-name>
            <surname>E.</surname>
          </string-name>
          ; et al.
          <year>2017</year>
          .
          <article-title>Finding useful data across multiple biomedical data repositories using datamed.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref57">
        <mixed-citation>
          <source>Nature genetics 49</source>
          <volume>(6)</volume>
          :
          <fpage>816</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref58">
        <mixed-citation>
          <string-name>
            <surname>Olteanu</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Castillo</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Diaz</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Kiciman</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref59">
        <mixed-citation>
          <article-title>Social data: Biases, methodological pitfalls, and ethical boundaries</article-title>
          .
          <source>Frontiers in Big Data</source>
          <volume>2</volume>
          :
          <fpage>13</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref60">
        <mixed-citation>
          <string-name>
            <surname>Palatucci</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Pomerleau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ; Hinton, G. E.; and Mitchell,
          <string-name>
            <surname>T. M.</surname>
          </string-name>
          <year>2009</year>
          .
          <article-title>Zero-shot learning with semantic output codes</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          ,
          <volume>1410</volume>
          -
          <fpage>1418</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref61">
        <mixed-citation>
          <string-name>
            <surname>Pennington</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Socher, R.; and
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Glove: Global vectors for word representation</article-title>
          .
          <source>In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)</source>
          ,
          <fpage>1532</fpage>
          -
          <lpage>1543</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref62">
        <mixed-citation>
          <string-name>
            <surname>Perera</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; Mendes,
          <string-name>
            <given-names>P. N.</given-names>
            ;
            <surname>Alex</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ;
            <surname>Sheth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. P.</given-names>
            ; and
            <surname>Thirunarayan</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          <year>2016</year>
          .
          <article-title>Implicit entity linking in tweets</article-title>
          .
          <source>In International Semantic Web Conference</source>
          ,
          <volume>118</volume>
          -
          <fpage>132</fpage>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref63">
        <mixed-citation>
          <string-name>
            <surname>Roy</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Park</surname>
          </string-name>
          , Y.; and
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>Learning domainspecific word embeddings from sparse cybersecurity texts</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref64">
        <mixed-citation>
          <source>arXiv preprint arXiv:1709</source>
          .
          <fpage>07470</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref65">
        <mixed-citation>
          <string-name>
            <surname>Rudin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead</article-title>
          .
          <source>Nature Machine Intelligence</source>
          <volume>1</volume>
          (
          <issue>5</issue>
          ):
          <fpage>206</fpage>
          -
          <lpage>215</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref66">
        <mixed-citation>
          <string-name>
            <surname>Rush</surname>
            ,
            <given-names>A. M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Chopra</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Weston</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>A neural attention model for abstractive sentence summarization</article-title>
          .
          <source>arXiv preprint arXiv:1509</source>
          .
          <fpage>00685</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref67">
        <mixed-citation>
          <string-name>
            <surname>Sarker</surname>
            ,
            <given-names>M. K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Xie</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Doran</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ; Raymer,
          <string-name>
            <surname>M.</surname>
          </string-name>
          ; and Hitzler,
          <string-name>
            <surname>P.</surname>
          </string-name>
          <year>2017</year>
          .
          <article-title>Explaining trained neural networks with semantic web technologies: First steps</article-title>
          .
          <source>arXiv preprint arXiv:1710</source>
          .
          <fpage>04324</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref68">
        <mixed-citation>
          <string-name>
            <surname>Scarlini</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Pasini</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ; and Navigli,
          <string-name>
            <surname>R.</surname>
          </string-name>
          <year>2020</year>
          .
          <article-title>Sensembert: Context-enhanced sense embeddings for multilingual word sense disambiguation</article-title>
          .
          <source>In Proc. of AAAI.</source>
        </mixed-citation>
      </ref>
      <ref id="ref69">
        <mixed-citation>
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Deng</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Du</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Fan</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Lei</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Knowledge-aware attentive neural network for ranking question answer pairs</article-title>
          .
          <source>In The 41st International ACM SIGIR Conference on Research &amp; Development in Information Retrieval</source>
          ,
          <fpage>901</fpage>
          -
          <lpage>904</lpage>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref70">
        <mixed-citation>
          <string-name>
            <surname>Sheth</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Kapanipathi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>Semantic filtering for social data</article-title>
          .
          <source>IEEE Internet Computing</source>
          <volume>20</volume>
          (
          <issue>4</issue>
          ):
          <fpage>74</fpage>
          -
          <lpage>78</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref71">
        <mixed-citation>
          <string-name>
            <surname>Sheth</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Thirunarayan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <year>2012</year>
          .
          <article-title>Semantics empowered web 3.0: managing enterprise, social, sensor, and cloudbased data and services for advanced applications</article-title>
          .
          <source>Synthesis Lectures on Data Management</source>
          <volume>4</volume>
          (
          <issue>6</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>175</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref72">
        <mixed-citation>
          <string-name>
            <surname>Sheth</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Perera</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Wijeratne</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Thirunarayan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>Knowledge will propel machine understanding of content: Extrapolating from current examples</article-title>
          .
          <source>arXiv preprint arXiv:1707</source>
          .
          <fpage>05308</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref73">
        <mixed-citation>
          <string-name>
            <surname>Sheth</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Gaur</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Kursuncu</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ; and Wickramarachchi,
          <string-name>
            <surname>R.</surname>
          </string-name>
          <year>2019</year>
          .
          <article-title>Shades of knowledge-infused learning for enhancing deep learning</article-title>
          .
          <source>IEEE Internet Computing</source>
          <volume>23</volume>
          (
          <issue>6</issue>
          ):
          <fpage>54</fpage>
          -
          <lpage>63</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref74">
        <mixed-citation>
          2018.
          <article-title>Learning from past mistakes: Improving automatic speech recognition output via noisy-clean phrase context modeling</article-title>
          . arXiv preprint arXiv:
          <year>1802</year>
          .02607.
        </mixed-citation>
      </ref>
      <ref id="ref75">
        <mixed-citation>
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Shrivastava</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Gupta</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>Revisiting unreasonable effectiveness of data in deep learning era</article-title>
          .
          <source>In Computer Vision</source>
          (ICCV),
          <year>2017</year>
          IEEE International Conference on,
          <fpage>843</fpage>
          -
          <lpage>852</lpage>
          . IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref76">
        <mixed-citation>
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Feng</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ; Zhang, H.;
          <string-name>
            <surname>Tian</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ; Tian, H.; and Wu,
          <string-name>
            <surname>H.</surname>
          </string-name>
          <year>2019</year>
          .
          <article-title>Ernie: Enhanced representation through knowledge integration</article-title>
          . arXiv preprint arXiv:
          <year>1904</year>
          .09223.
        </mixed-citation>
      </ref>
      <ref id="ref77">
        <mixed-citation>
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ; Martens,
          <string-name>
            <surname>J.</surname>
          </string-name>
          ; Dahl, G.; and Hinton,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref78">
        <mixed-citation>
          <article-title>On the importance of initialization and momentum in deep learning</article-title>
          .
          <source>In International conference on machine learning</source>
          ,
          <fpage>1139</fpage>
          -
          <lpage>1147</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref79">
        <mixed-citation>
          <string-name>
            <surname>Tan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Huo,
          <string-name>
            <given-names>Y.</given-names>
            ;
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            ; and
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>L.</surname>
          </string-name>
          <year>2019</year>
          .
          <article-title>Expert knowledge-infused deep learning for automatic lung nodule detection</article-title>
          .
          <source>Journal of X-ray science and technology 27</source>
          <volume>(1)</volume>
          :
          <fpage>17</fpage>
          -
          <lpage>35</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref80">
        <mixed-citation>
          <string-name>
            <surname>Topol</surname>
            ,
            <given-names>E. J.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>High-performance medicine: the convergence of human and artificial intelligence</article-title>
          .
          <source>Nature</source>
          medicine
          <volume>25</volume>
          (
          <issue>1</issue>
          ):
          <fpage>44</fpage>
          -
          <lpage>56</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref81">
        <mixed-citation>
          <string-name>
            <surname>Valiant</surname>
            ,
            <given-names>L. G.</given-names>
          </string-name>
          <year>2000</year>
          .
          <article-title>Robust logics</article-title>
          .
          <source>Artificial Intelligence</source>
          <volume>117</volume>
          (
          <issue>2</issue>
          ):
          <fpage>231</fpage>
          -
          <lpage>253</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref82">
        <mixed-citation>
          2017.
          <article-title>Combination of domain knowledge and deep learning for sentiment analysis</article-title>
          .
          <source>In International Workshop on Multi-disciplinary Trends in Artificial Intelligence</source>
          ,
          <fpage>162</fpage>
          -
          <lpage>173</lpage>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref83">
        <mixed-citation>
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ; Zhang, J.;
          <string-name>
            <surname>Feng</surname>
          </string-name>
          , J.; and
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Knowledge graph embedding by translating on hyperplanes</article-title>
          .
          <source>In AAAI</source>
          , volume
          <volume>14</volume>
          ,
          <fpage>1112</fpage>
          -
          <lpage>1119</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref84">
        <mixed-citation>
          <string-name>
            <surname>Yi</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Jian</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; Chen,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          ; and Zheng,
          <string-name>
            <surname>N.</surname>
          </string-name>
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref85">
        <mixed-citation>
          <article-title>Knowledge-based recurrent attentive neural network for traffic sign detection</article-title>
          . arXiv preprint arXiv:
          <year>1803</year>
          .05263.
        </mixed-citation>
      </ref>
      <ref id="ref86">
        <mixed-citation>
          <string-name>
            <surname>Young</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Cambria</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Chaturvedi</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ; Biswas,
          <string-name>
            <surname>S.</surname>
          </string-name>
          ; and Huang,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <year>2018</year>
          .
          <article-title>Augmenting end-to-end dialogue systems with commonsense knowledge</article-title>
          .
          <source>In Thirty-Second AAAI Conference on Artificial Intelligence.</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>