Exploiting ontologies for deep learning: a case
                for sentiment mining?

                  Stephan Raaijmakers and Christopher Brewster

TNO, Data Science Department, Anna van Buerenplein 1, The Hague, 2595 DA, The
                                Netherlands


        Abstract. We present a practical method for explaining deep learning-
        based text mining with ontology-based information. Our approach uses
        the recently proposed OntoSenticNet ontology for sentiment mining, and
        consists of a composite deep learning classifier for sentiment mining, en-
        dowed with an ontology-driven attention module. The attention module
        analyzes the attention the neural network pays to semantic labels as-
        signed to bigrams in input texts.


1     Introduction and approach
Deep learning continues to achieve state of the art performance in a variety of do-
mains, such as image analysis and text mining. Despite this success, deep learning
models remain elusive, and it is quite hard to understand what knowledge is rep-
resented in them, and how they generate decisions (see [3] for discussion). The
field of explainable AI is increasingly gaining traction. Promising results have
been reported with attention-based models [4] and latent-space analysis [7].
The link between ontologies and deep learning is actively being expored. For in-
stance, [6] addresses the extraction of OWL information with deep learning from
raw text and [2] applies deep learning to ontology extraction. Our approach at-
tempts to leverage the semantic information in ontologies for explaining deep
text mining, using neural attention and word embeddings. Ontologies usually
contain structured, encyclopedic knowledge, arranged in a semantic, conceptual
structure. One such ontology is the recently proposed sentiment ontology On-
toSenticNet [1], an extension of the SenticNet ontology. SenticNet (Figure 1(a))
links entities via an intermediate concept level (consisting of semantic categories
and relations) to an a↵ective level describing sentiment-based associations, like
sadness or joy. OntoSenticNet uses SenticNet to derive a↵ective associations for
words and phrases. It is automatically compiled from a↵ective analyses per-
formed with WordNet-A↵ect, Open Mind Common Sense and GECKA. Figure
1(b) lists the OntoSenticNet entry for “wrong food”. The primitiveURI nodes
contain the a↵ective labels associated with the multi-word expression “wrong
food”. The semantics nodes express associations with other NamedIndividuals
(expressions), based on corpus-based evidence such as collocations, and the static
knowledge contained in SenticNet. We embed the ontology information into the
?
    The research reported in this paper has been carried out within the Research Pro-
    gramme Applied AI of TNO.
2         Stephan Raaijmakers and Christopher Brewster


              (a) SenticNet                 (b) OntoSenticNet entry for ’wrong food’


                       (c) Process flow                    (d) Model

    Fig. 1. SenticNet, OntoSenticNet, our processing pipeline and model architecture.


sentiment analysis process directly, combining it with non-ontological informa-
tion such as textual features. Taking advantage of the attention a neural network
pays to the extra ontology-based information will allow us to decompose its deci-
sions semantically. We start (Figure 1(c)) with generating vector representations
of our input data, using 100-dimensional GloVe vectors [5], which were derived on
the basis of 6 billion words coming from a 2014 English fragment of Wikipedia.
Every document is represented as the sum of the GloVe vectors of its constitut-
ing words, normalized for the length of the document. Subsequently, we chunk
up every document in bigrams, and perform a beam search over the semantically
labeled bigrams in the OntoSenticNet ontology. As semantic labels for bigrams,
we use the primitiveURI labels, and every combination in OntoSenticNet gen-
erates a unique label. In order to cater for bigrams without overt a↵ective labels,
we randomly took 5,000 bigrams from a BBC news corpus1 , and labeled these
bigrams as ’bbc’. This approach yields, for every dataset we use, a unique set
of semantic labels. Restricting our use of OntoSenticNet to bigrams allows us to
look for contextual matches rather than for word-based matches, without run-

1
    http://mlg.ucd.ie/datasets/bbc.html
           Exploiting ontologies for deep learning: a case for sentiment mining   3

ning into sparsity: OntoSenticNet contains 22,935 bigram expressions, and only
3,104 expressions longer than 2 words. The majority of OntoSenticNet entries
consists of unigrams (26,912 entries). The beam search operation attempts to
retrieve, for any combination (100 total) of the 10 most similar words per word in
the bigram, an existing bigram from OntoSenticNet. As an example, ’bad dinner’
is not in OntoSenticNet, but one of its GloVe expansions (’wrong food’) is. Once
such a hit is found, the beam search stops for the given input bigram, the seman-
tic labels are picked up from OntoSenticNet, and search proceeds with the next
bigram in the document. The relation between an OntoSenticNet bigram and its
labels is stored as an entry in a dictionary. The attested semantic label for every
bigram in a document is counted, and for every document, a count vector (with
as its length the total number of labels attested in the corpus) is generated and
stored. After processing a labeled text corpus in this manner, every document
in the corpus becomes represented by two vectors: a GloVe-based vector, and a
count vector describing the counts for the semantic labels that apply to the bi-
grams in the document. Subsequently, we train a neural network (Figure1(d)) on
these joint representations of labeled documents. The network has two branches,
each equipped with a separate input layer. First, a branch processes the ontology
label vectors, and computes attention scores (probabilities) for the various labels
in the vectors. These attention scores indicate the importance (’attention’) the
network pays to the ontology labels. They are merged with the GloVe vectors
by concatenation, and this derived representation is used by a second branch to
learn the labeling of documents with sentiment labels. The attention probabili-
ties are optimized during this process in an end-to-end fashion (they are part of
the overall weight optimization problem the network is solving). Once learning
is complete, for every test case, the attention scores as computed by the trained
network for the test document are extracted from the network, and an image is
generated that displays the scores. We applied our system to a variety of senti-
ment labeling datasets: a set of UCI datasets2 comprising Yelp, Amazon product
and IMDB movie reviews. In addition, we trained and tested on a subjectivity
dataset3 .


2     Results
Some illustrative results are listed in Figure 2. For the complex emotion ex-
pressed in the sentence The only thing I did like was the prime rib and the dessert
section, the OntoSenticNet labels anger, sadness, disgust, surprise score
relatively high. Sentences We’d definitely go back here again and Will go back
next trip out both score high for the joint label joy#surprise. The negative
sentiment of ...least think to refill my water before I struggle to wave you over
for 10 minutes has significant underpinning with disgust and anger labels. The
attention probabilities extracted from our classifier may thus serve to decompose
monadic sentiment labels into much more rich and varied descriptions, enhancing
2
    https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences
3
    https://www.cs.cornell.edu/people/pabo/movie-review-data/
4       Stephan Raaijmakers and Christopher Brewster


                       Fig. 2. Sample attention-based analyses.


the explainability of monadic sentiment labeling. The explanatory advantages of
our system will be assessed in future work by submitting the generated analyses
to human evaluators in a task-based evaluation setting, and by displaying the un-
derlying words and phrases used by the model for sentiment decomposition. Our
code will be shared at https://github.com/stephanraaijmakers/deeptext.


References
1. Dragoni, M., Poria, S., Cambria, E.: Ontosenticnet: A commonsense ontology for
   sentiment analysis. IEEE Computational Intelligence Magazine 33(2) (2018)
2. Hohenecker, P., Lukasiewicz, T.: Deep learning for ontology reasoning. CoRR
   abs/1705.10342 (2017)
3. Lipton, Z.C.: The mythos of model interpretability. CoRR abs/1606.03490 (2016)
4. Liu, S., Wang, X., Liu, M., Zhu, J.: Towards better analysis of machine learning
   models: A visual analytics perspective. Visual Informatics 1(1), 48 – 56 (2017)
5. Pennington, J., Socher, R., Manning, C.: Glove: Global vectors for word represen-
   tation. In: EMNLP. vol. 14, pp. 1532–1543 (01 2014)
6. Petrucci, G., Ghidini, C., Rospocher, M.: Ontology learning in the deep. In: 20th
   International Conference on Knowledge Engineering and Knowledge Management
   - Volume 10024. pp. 480–495. EKAW 2016, Springer-Verlag New York, Inc., New
   York, NY, USA (2016)
7. Raaijmakers, S., Sappelli, M., Kraaij, W.: Investigating the interpretability of hidden
   layers in deep text mining. In: SEMANTICS. pp. 177–180. ACM (2017)