<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Exploiting ontologies for deep learning: a case for sentiment mining ?</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Stephan</forename><surname>Raaijmakers</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Data Science Department</orgName>
								<orgName type="institution">TNO</orgName>
								<address>
									<addrLine>Anna van Buerenplein 1, The Hague</addrLine>
									<postCode>2595 DA</postCode>
									<country key="NL">The Netherlands</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Christopher</forename><surname>Brewster</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Data Science Department</orgName>
								<orgName type="institution">TNO</orgName>
								<address>
									<addrLine>Anna van Buerenplein 1, The Hague</addrLine>
									<postCode>2595 DA</postCode>
									<country key="NL">The Netherlands</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Exploiting ontologies for deep learning: a case for sentiment mining ?</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">EFCDF290A8EE63D121C75CF30043D4CB</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T08:03+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>We present a practical method for explaining deep learningbased text mining with ontology-based information. Our approach uses the recently proposed OntoSenticNet ontology for sentiment mining, and consists of a composite deep learning classifier for sentiment mining, endowed with an ontology-driven attention module. The attention module analyzes the attention the neural network pays to semantic labels assigned to bigrams in input texts.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction and approach</head><p>Deep learning continues to achieve state of the art performance in a variety of domains, such as image analysis and text mining. Despite this success, deep learning models remain elusive, and it is quite hard to understand what knowledge is represented in them, and how they generate decisions (see <ref type="bibr" target="#b2">[3]</ref> for discussion). The field of explainable AI is increasingly gaining traction. Promising results have been reported with attention-based models <ref type="bibr" target="#b3">[4]</ref> and latent-space analysis <ref type="bibr" target="#b6">[7]</ref>. The link between ontologies and deep learning is actively being expored. For instance, <ref type="bibr" target="#b5">[6]</ref> addresses the extraction of OWL information with deep learning from raw text and <ref type="bibr" target="#b1">[2]</ref> applies deep learning to ontology extraction. Our approach attempts to leverage the semantic information in ontologies for explaining deep text mining, using neural attention and word embeddings. Ontologies usually contain structured, encyclopedic knowledge, arranged in a semantic, conceptual structure. One such ontology is the recently proposed sentiment ontology On-toSenticNet <ref type="bibr" target="#b0">[1]</ref>, an extension of the SenticNet ontology. SenticNet (Figure <ref type="figure" target="#fig_0">1(a)</ref>) links entities via an intermediate concept level (consisting of semantic categories and relations) to an a↵ective level describing sentiment-based associations, like sadness or joy. OntoSenticNet uses SenticNet to derive a↵ective associations for words and phrases. It is automatically compiled from a↵ective analyses performed with WordNet-A↵ect, Open Mind Common Sense and GECKA. Figure <ref type="figure" target="#fig_0">1</ref>(b) lists the OntoSenticNet entry for "wrong food". The primitiveURI nodes contain the a↵ective labels associated with the multi-word expression "wrong food". The semantics nodes express associations with other NamedIndividuals (expressions), based on corpus-based evidence such as collocations, and the static knowledge contained in SenticNet. We embed the ontology information into the sentiment analysis process directly, combining it with non-ontological information such as textual features. Taking advantage of the attention a neural network pays to the extra ontology-based information will allow us to decompose its decisions semantically. We start (Figure <ref type="figure" target="#fig_0">1</ref>(c)) with generating vector representations of our input data, using 100-dimensional GloVe vectors <ref type="bibr" target="#b4">[5]</ref>, which were derived on the basis of 6 billion words coming from a 2014 English fragment of Wikipedia. Every document is represented as the sum of the GloVe vectors of its constituting words, normalized for the length of the document. Subsequently, we chunk up every document in bigrams, and perform a beam search over the semantically labeled bigrams in the OntoSenticNet ontology. As semantic labels for bigrams, we use the primitiveURI labels, and every combination in OntoSenticNet generates a unique label. In order to cater for bigrams without overt a↵ective labels, we randomly took 5,000 bigrams from a BBC news corpus 1 , and labeled these bigrams as 'bbc'. This approach yields, for every dataset we use, a unique set of semantic labels. Restricting our use of OntoSenticNet to bigrams allows us to look for contextual matches rather than for word-based matches, without run-ning into sparsity: OntoSenticNet contains 22,935 bigram expressions, and only 3,104 expressions longer than 2 words. The majority of OntoSenticNet entries consists of unigrams (26,912 entries). The beam search operation attempts to retrieve, for any combination (100 total) of the 10 most similar words per word in the bigram, an existing bigram from OntoSenticNet. As an example, 'bad dinner' is not in OntoSenticNet, but one of its GloVe expansions ('wrong food') is. Once such a hit is found, the beam search stops for the given input bigram, the semantic labels are picked up from OntoSenticNet, and search proceeds with the next bigram in the document. The relation between an OntoSenticNet bigram and its labels is stored as an entry in a dictionary. The attested semantic label for every bigram in a document is counted, and for every document, a count vector (with as its length the total number of labels attested in the corpus) is generated and stored. After processing a labeled text corpus in this manner, every document in the corpus becomes represented by two vectors: a GloVe-based vector, and a count vector describing the counts for the semantic labels that apply to the bigrams in the document. Subsequently, we train a neural network (Figure1(d)) on these joint representations of labeled documents. The network has two branches, each equipped with a separate input layer. First, a branch processes the ontology label vectors, and computes attention scores (probabilities) for the various labels in the vectors. These attention scores indicate the importance ('attention') the network pays to the ontology labels. They are merged with the GloVe vectors by concatenation, and this derived representation is used by a second branch to learn the labeling of documents with sentiment labels. The attention probabilities are optimized during this process in an end-to-end fashion (they are part of the overall weight optimization problem the network is solving). Once learning is complete, for every test case, the attention scores as computed by the trained network for the test document are extracted from the network, and an image is generated that displays the scores. We applied our system to a variety of sentiment labeling datasets: a set of UCI datasets <ref type="foot" target="#foot_1">2</ref>comprising Yelp, Amazon product and IMDB movie reviews. In addition, we trained and tested on a subjectivity dataset <ref type="foot" target="#foot_2">3</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Results</head><p>Some illustrative results are listed in Figure <ref type="figure" target="#fig_1">2</ref>. For the complex emotion expressed in the sentence The only thing I did like was the prime rib and the dessert section, the OntoSenticNet labels anger, sadness, disgust, surprise score relatively high. Sentences We'd definitely go back here again and Will go back next trip out both score high for the joint label joy#surprise. The negative sentiment of ...least think to refill my water before I struggle to wave you over for 10 minutes has significant underpinning with disgust and anger labels. The attention probabilities extracted from our classifier may thus serve to decompose monadic sentiment labels into much more rich and varied descriptions, enhancing the explainability of monadic sentiment labeling. The explanatory advantages of our system will be assessed in future work by submitting the generated analyses to human evaluators in a task-based evaluation setting, and by displaying the underlying words and phrases used by the model for sentiment decomposition. Our code will be shared at https://github.com/stephanraaijmakers/deeptext.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. SenticNet, OntoSenticNet, our processing pipeline and model architecture.</figDesc><graphic coords="2,185.65,284.72,129.45,126.52" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 2 .</head><label>2</label><figDesc>Fig. 2. Sample attention-based analyses.</figDesc></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">http://mlg.ucd.ie/datasets/bbc.html</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">https://www.cs.cornell.edu/people/pabo/movie-review-data/</note>
		</body>
		<back>

			<div type="funding">
<div xmlns="http://www.tei-c.org/ns/1.0"> ?  <p>The research reported in this paper has been carried out within the Research Programme Applied AI of TNO.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Ontosenticnet: A commonsense ontology for sentiment analysis</title>
		<author>
			<persName><forename type="first">M</forename><surname>Dragoni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Poria</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Cambria</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Computational Intelligence Magazine</title>
		<imprint>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="issue">2</biblScope>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">Deep learning for ontology reasoning</title>
		<author>
			<persName><forename type="first">P</forename><surname>Hohenecker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Lukasiewicz</surname></persName>
		</author>
		<idno>CoRR abs/1705.10342</idno>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">The mythos of model interpretability</title>
		<author>
			<persName><forename type="first">Z</forename><forename type="middle">C</forename><surname>Lipton</surname></persName>
		</author>
		<idno>CoRR abs/1606.03490</idno>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Towards better analysis of machine learning models: A visual analytics perspective</title>
		<author>
			<persName><forename type="first">S</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Visual Informatics</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="48" to="56" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Glove: Global vectors for word representation</title>
		<author>
			<persName><forename type="first">J</forename><surname>Pennington</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Socher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Manning</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">EMNLP</title>
		<imprint>
			<biblScope unit="volume">14</biblScope>
			<biblScope unit="page" from="1532" to="1543" />
			<date type="published" when="2014-01">01 2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Ontology learning in the deep</title>
		<author>
			<persName><forename type="first">G</forename><surname>Petrucci</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Ghidini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rospocher</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">20th International Conference on Knowledge Engineering and Knowledge Management -Volume 10024</title>
				<meeting><address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Springer-Verlag New York, Inc</publisher>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="480" to="495" />
		</imprint>
	</monogr>
	<note>EKAW 2016</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Investigating the interpretability of hidden layers in deep text mining</title>
		<author>
			<persName><forename type="first">S</forename><surname>Raaijmakers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sappelli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Kraaij</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">SEMANTICS</title>
				<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="177" to="180" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
