<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Exploiting ontologies for deep learning: a case for sentiment mining?</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>TNO, Data Science Department</institution>
          ,
          <addr-line>Anna van Buerenplein 1, The Hague, 2595 DA</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We present a practical method for explaining deep learningbased text mining with ontology-based information. Our approach uses the recently proposed OntoSenticNet ontology for sentiment mining, and consists of a composite deep learning classifier for sentiment mining, endowed with an ontology-driven attention module. The attention module analyzes the attention the neural network pays to semantic labels assigned to bigrams in input texts.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Deep learning continues to achieve state of the art performance in a variety of
domains, such as image analysis and text mining. Despite this success, deep learning
models remain elusive, and it is quite hard to understand what knowledge is
represented in them, and how they generate decisions (see [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] for discussion). The
field of explainable AI is increasingly gaining traction. Promising results have
been reported with attention-based models [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and latent-space analysis [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
The link between ontologies and deep learning is actively being expored. For
instance, [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] addresses the extraction of OWL information with deep learning from
raw text and [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] applies deep learning to ontology extraction. Our approach
attempts to leverage the semantic information in ontologies for explaining deep
text mining, using neural attention and word embeddings. Ontologies usually
contain structured, encyclopedic knowledge, arranged in a semantic, conceptual
structure. One such ontology is the recently proposed sentiment ontology
OntoSenticNet [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], an extension of the SenticNet ontology. SenticNet (Figure 1(a))
links entities via an intermediate concept level (consisting of semantic categories
and relations) to an a↵ective level describing sentiment-based associations, like
sadness or joy. OntoSenticNet uses SenticNet to derive a↵ective associations for
words and phrases. It is automatically compiled from a↵ective analyses
performed with WordNet-A↵ect, Open Mind Common Sense and GECKA. Figure
1(b) lists the OntoSenticNet entry for “wrong food”. The primitiveURI nodes
contain the a↵ective labels associated with the multi-word expression “wrong
food”. The semantics nodes express associations with other NamedIndividuals
(expressions), based on corpus-based evidence such as collocations, and the static
knowledge contained in SenticNet. We embed the ontology information into the
? The research reported in this paper has been carried out within the Research
Programme Applied AI of TNO.
(a) SenticNet
      </p>
      <p>
        (b) OntoSenticNet entry for ’wrong food’
(c) Process flow
(d) Model
sentiment analysis process directly, combining it with non-ontological
information such as textual features. Taking advantage of the attention a neural network
pays to the extra ontology-based information will allow us to decompose its
decisions semantically. We start (Figure 1(c)) with generating vector representations
of our input data, using 100-dimensional GloVe vectors [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], which were derived on
the basis of 6 billion words coming from a 2014 English fragment of Wikipedia.
Every document is represented as the sum of the GloVe vectors of its
constituting words, normalized for the length of the document. Subsequently, we chunk
up every document in bigrams, and perform a beam search over the semantically
labeled bigrams in the OntoSenticNet ontology. As semantic labels for bigrams,
we use the primitiveURI labels, and every combination in OntoSenticNet
generates a unique label. In order to cater for bigrams without overt a↵ective labels,
we randomly took 5,000 bigrams from a BBC news corpus1, and labeled these
bigrams as ’bbc’. This approach yields, for every dataset we use, a unique set
of semantic labels. Restricting our use of OntoSenticNet to bigrams allows us to
look for contextual matches rather than for word-based matches, without
run
      </p>
    </sec>
    <sec id="sec-2">
      <title>1 http://mlg.ucd.ie/datasets/bbc.html</title>
      <p>Exploiting ontologies for deep learning: a case for sentiment mining
ning into sparsity: OntoSenticNet contains 22,935 bigram expressions, and only
3,104 expressions longer than 2 words. The majority of OntoSenticNet entries
consists of unigrams (26,912 entries). The beam search operation attempts to
retrieve, for any combination (100 total) of the 10 most similar words per word in
the bigram, an existing bigram from OntoSenticNet. As an example, ’bad dinner’
is not in OntoSenticNet, but one of its GloVe expansions (’wrong food’) is. Once
such a hit is found, the beam search stops for the given input bigram, the
semantic labels are picked up from OntoSenticNet, and search proceeds with the next
bigram in the document. The relation between an OntoSenticNet bigram and its
labels is stored as an entry in a dictionary. The attested semantic label for every
bigram in a document is counted, and for every document, a count vector (with
as its length the total number of labels attested in the corpus) is generated and
stored. After processing a labeled text corpus in this manner, every document
in the corpus becomes represented by two vectors: a GloVe-based vector, and a
count vector describing the counts for the semantic labels that apply to the
bigrams in the document. Subsequently, we train a neural network (Figure1(d)) on
these joint representations of labeled documents. The network has two branches,
each equipped with a separate input layer. First, a branch processes the ontology
label vectors, and computes attention scores (probabilities) for the various labels
in the vectors. These attention scores indicate the importance (’attention’) the
network pays to the ontology labels. They are merged with the GloVe vectors
by concatenation, and this derived representation is used by a second branch to
learn the labeling of documents with sentiment labels. The attention
probabilities are optimized during this process in an end-to-end fashion (they are part of
the overall weight optimization problem the network is solving). Once learning
is complete, for every test case, the attention scores as computed by the trained
network for the test document are extracted from the network, and an image is
generated that displays the scores. We applied our system to a variety of
sentiment labeling datasets: a set of UCI datasets2 comprising Yelp, Amazon product
and IMDB movie reviews. In addition, we trained and tested on a subjectivity
dataset3.
2</p>
      <p>Results
Some illustrative results are listed in Figure 2. For the complex emotion
expressed in the sentence The only thing I did like was the prime rib and the dessert
section, the OntoSenticNet labels anger, sadness, disgust, surprise score
relatively high. Sentences We’d definitely go back here again and Will go back
next trip out both score high for the joint label joy#surprise. The negative
sentiment of ...least think to refill my water before I struggle to wave you over
for 10 minutes has significant underpinning with disgust and anger labels. The
attention probabilities extracted from our classifier may thus serve to decompose
monadic sentiment labels into much more rich and varied descriptions, enhancing</p>
    </sec>
    <sec id="sec-3">
      <title>2 https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences</title>
      <p>3 https://www.cs.cornell.edu/people/pabo/movie-review-data/
the explainability of monadic sentiment labeling. The explanatory advantages of
our system will be assessed in future work by submitting the generated analyses
to human evaluators in a task-based evaluation setting, and by displaying the
underlying words and phrases used by the model for sentiment decomposition. Our
code will be shared at https://github.com/stephanraaijmakers/deeptext.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Dragoni</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Poria</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cambria</surname>
          </string-name>
          , E.:
          <article-title>Ontosenticnet: A commonsense ontology for sentiment analysis</article-title>
          .
          <source>IEEE Computational Intelligence Magazine</source>
          <volume>33</volume>
          (
          <issue>2</issue>
          ) (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Hohenecker</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lukasiewicz</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Deep learning for ontology reasoning</article-title>
          .
          <source>CoRR abs/1705</source>
          .10342 (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Lipton</surname>
            ,
            <given-names>Z.C.</given-names>
          </string-name>
          :
          <article-title>The mythos of model interpretability</article-title>
          .
          <source>CoRR abs/1606</source>
          .03490 (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
          </string-name>
          , J.:
          <article-title>Towards better analysis of machine learning models: A visual analytics perspective</article-title>
          .
          <source>Visual Informatics</source>
          <volume>1</volume>
          (
          <issue>1</issue>
          ),
          <fpage>48</fpage>
          -
          <lpage>56</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Pennington</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          : Glove:
          <article-title>Global vectors for word representation</article-title>
          .
          <source>In: EMNLP</source>
          . vol.
          <volume>14</volume>
          , pp.
          <fpage>1532</fpage>
          -
          <lpage>1543</lpage>
          (01
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Petrucci</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghidini</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rospocher</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Ontology learning in the deep</article-title>
          .
          <source>In: 20th International Conference on Knowledge Engineering and Knowledge</source>
          Management - Volume
          <volume>10024</volume>
          . pp.
          <fpage>480</fpage>
          -
          <lpage>495</lpage>
          . EKAW 2016, Springer-Verlag New York, Inc., New York, NY, USA (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Raaijmakers</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sappelli</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kraaij</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Investigating the interpretability of hidden layers in deep text mining</article-title>
          .
          <source>In: SEMANTICS</source>
          . pp.
          <fpage>177</fpage>
          -
          <lpage>180</lpage>
          . ACM (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>