<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Conceptual Shadows: Visualizing Concept-specific Dimensions of Meaning in Word Embeddings with Self Organizing Maps</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Laura</forename><surname>Spillner</surname></persName>
							<email>laura.spillner@uni-bremen.de</email>
							<affiliation key="aff0">
								<orgName type="department">Digital Media Lab</orgName>
								<orgName type="institution">University of Bremen</orgName>
								<address>
									<addrLine>Bibliothekstr. 5</addrLine>
									<postCode>28359</postCode>
									<settlement>Bremen</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Robert</forename><surname>Porzel</surname></persName>
							<email>porzel@uni-bremen.de</email>
							<affiliation key="aff0">
								<orgName type="department">Digital Media Lab</orgName>
								<orgName type="institution">University of Bremen</orgName>
								<address>
									<addrLine>Bibliothekstr. 5</addrLine>
									<postCode>28359</postCode>
									<settlement>Bremen</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Robin</forename><surname>Nolte</surname></persName>
							<email>nolte@uni-bremen.de</email>
							<affiliation key="aff0">
								<orgName type="department">Digital Media Lab</orgName>
								<orgName type="institution">University of Bremen</orgName>
								<address>
									<addrLine>Bibliothekstr. 5</addrLine>
									<postCode>28359</postCode>
									<settlement>Bremen</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Rainer</forename><surname>Malaka</surname></persName>
							<email>malaka@tzi.de</email>
							<affiliation key="aff0">
								<orgName type="department">Digital Media Lab</orgName>
								<orgName type="institution">University of Bremen</orgName>
								<address>
									<addrLine>Bibliothekstr. 5</addrLine>
									<postCode>28359</postCode>
									<settlement>Bremen</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff1">
								<address>
									<postCode>2023</postCode>
									<settlement>Sherbrooke</settlement>
									<region>Québec</region>
									<country key="CA">Canada</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Conceptual Shadows: Visualizing Concept-specific Dimensions of Meaning in Word Embeddings with Self Organizing Maps</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">7D007F4B7D049EEFB11A7AE3090871B8</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T19:27+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Word-Embeddings, Ontologies, Language Processing, Orcid 0000-0001-8490-8961 (L. Spillner)</term>
					<term>0000-0002-7686-2921 (R. Porzel)</term>
					<term>0009-0004-2975-6378 (R. Nolte)</term>
					<term>0000-0002-7686-2921 (R. Malaka)</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Word embeddings (high-dimensional vectors) are common input representations in NLP. However, this kind of representation is not meaningful to humans; it presents a black box that makes it difficult to explain how the vectors influence downstream models. Visualizing word vectors usually requires dimensionality reduction. We explore the visualization of word vectors as 2D images (one image per word, one pixel per vector dimension) by organizing the dimensions in the image with a self-organizing map. This method reveals new insights into how and where semantic information is encoded in the vector and allows us to pinpoint the source of downstream classification errors in the input representation. In this paper, we present the first results of an investigation into word embeddings that visualizes individual word vectors as images and explores what information the individual dimensions of the vectors encode. As this encoded information is specific to the given target concepts of a symbolic downstream classification task, it can be regarded as a projection from the symbolic space to that of the deep neural network.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Undoubtedly, both symbolic and sub-symbolic approaches to artificial intelligence (AI) have their respective merits and individual shortcomings. In many applications, they are already joined at the hip, as the output of deep learning models often consists of classes that are symbolically described and used further on in some overall processing pipeline. One of the main areas of interest in the field of explainable artificial intelligence (XAI), and arguably one of the driving factors of recent interest in the field, is the explanation of black-box deep learning models. In natural language processing (NLP), deep neural networks (DNNs) are used in two ways: Firstly, to produce numeric input representations from natural language texts, and secondly, to solve downstream tasks, e.g., classification, clustering, or language generation. In many of these downstream tasks, conceptual models, such as ontologies, of the task-specific domain constitute the target representations for the classification.</p><p>For example, when classifying the part of speech (POS) of the words used in a sentence, specific classes are used as the values of the POS attribute, e.g., Noun, Verb, or Adj. These values are often part of a conceptual model, e.g., an ontology of linguistic entities, such as the GOLD ontology <ref type="bibr" target="#b0">[1]</ref>, the OntoWordNet model <ref type="bibr" target="#b1">[2]</ref>, or the LingInfo model <ref type="bibr" target="#b2">[3]</ref>. In many cases, therefore, sub-symbolic approaches are used to classify entities stemming from some ontological model. In these sub-symbolic approaches, representations in which one word constitutes one symbol, such as bag-of-word or n-gram models, have largely been replaced by distributed semantic representations -also called word embeddings or word vectors -to represent text. It is generally accepted that the embeddings encode semantic information about a word and that words close to each other in the vector space are similar in meaning <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b4">5]</ref>. However, high-dimensional word vectors pose difficulty from the XAI perspective because they essentially add a second black box, the model learning the embeddings, around the model used for the task itself.</p><p>When it comes to fields such as computer vision, many techniques have been developed to explain DNNs, e.g., by generating example images of the classes they are trained to identify or by highlighting image areas of particular importance in the classification <ref type="bibr" target="#b5">[6]</ref>. Even though the input is represented numerically, the representation (the digital image) is still meaningful to humans. In contrast, a word vector as a point in very high-dimensional vector space is rather difficult to imagine or to represent visually. Because of this, visual explanations in the field of NLP usually fall into one of two categories: One option is to use dimensionality reduction to represent word vectors as points in 2D space, thus making it possible to see which words are close together. The other option is to consider not individual words but rather texts and highlight words as salient features, e.g., when predicting the topic of a text <ref type="bibr" target="#b6">[7]</ref>.</p><p>In this paper, we present the first results of an investigation into word embeddings that takes a different approach: We visualize individual word vectors as images and, inspired by XAI methods from computer vision, explore what information the individual dimensions of the vectors encode. This encoded information is specific to the given target concepts of the downstream classification task at hand. It can be regarded as a projection from the symbolic conceptual space to that of the DNN. For each conceptual entity, e.g., Noun or Verb, Cat or Dog, etc., we obtain its visual projection into the sub-symbolic space. We call this the conceptual shadow of that entity. One application for this approach is to improve understanding of the input representations we use for NLP tasks. We hope to utilize this method to understand the origin of mistakes in the downstream model, such as incorrect classifications where a given ontological model constitutes the target representation. In the long run, this work seeks to connect sub-symbolic and symbolic representations of the same conceptual entity.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>In this section, we provide short overviews of prior art with respect to ontological models of linguistic knowledge, word embeddings, and explainable natural language processing.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Modeling Linguistic Knowledge</head><p>Various approaches have been proposed to model linguistic knowledge, i.e., the entities and features that make up human language, in formal ontologies. These approaches differ in some respects, such as alignment to upper layers, their modeling intent, and their scope. One point of divergence lies in the alignment to a foundational layer. While, for example, the GOLD ontology <ref type="bibr" target="#b0">[1]</ref> aligns with the SUMO upper ontology <ref type="bibr" target="#b7">[8]</ref>, the OntoWordNet model <ref type="bibr" target="#b1">[2]</ref> aligns with the DOLCE foundational ontology <ref type="bibr" target="#b8">[9]</ref>. The LingInfo model <ref type="bibr" target="#b2">[3]</ref> can be used with any foundational framework as it relies on meta-classes to model information about the lexical entities. For also representing pragmatically relevant information, the SOMA-SAY <ref type="bibr" target="#b9">[10]</ref> is based on Dolce Ultra Light and the Descriptions &amp; Situations Module <ref type="bibr" target="#b10">[11]</ref>. In contrast, the OntoWordNet aims at merging the linguistic information contained in WordNet with the respective classes employed in specific domain models, while both LingInfo and GOLD seek to incorporate more linguistic information, such as morphological and grammatical features of language. They all allow a direct connection of the respective linguistic information for terms with corresponding classes and properties in a domain ontology. Each model could be integrated into an NLP system as an additional module to allow reasoning about linguistic information or as a link between lexical and ontological resources.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Word Embeddings</head><p>Semantic embeddings have become standard input representations for many machine learning NLP tasks. Since the conception of word vectors in <ref type="bibr" target="#b3">[4]</ref>, improvements have been made with the introduction of character-based models and contextual representation <ref type="bibr" target="#b11">[12,</ref><ref type="bibr" target="#b12">13]</ref>, which allow fine-tuning of pre-trained embeddings for downstream tasks <ref type="bibr" target="#b11">[12,</ref><ref type="bibr" target="#b13">14]</ref>, as well as with the addition of transformer-based models <ref type="bibr" target="#b14">[15]</ref> and attention mechanisms <ref type="bibr" target="#b15">[16]</ref>. For this work, it is mainly important to differentiate between static representations, used in older models such as GloVe embeddings <ref type="bibr" target="#b4">[5]</ref>, and dynamic embeddings, which are part of Language Models like BERT <ref type="bibr" target="#b11">[12]</ref>. With static embeddings, the same word is invariably represented by the same vector -it does not differ between different uses of the same word, e.g., homonyms or the exact spelling used as different POS. These static word vectors are then used as the input representation for downstream tasks. In contrast, when using dynamic embeddings, each use of the word in a text is represented by a different vector. Language models still represent each token in a text as a unique vector, but these are not generally intended to be accessible from outside the model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Explainable NLP</head><p>The XAI literature differentiates between three types of explanations <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b5">6]</ref>:</p><p>1. Explanations of network processing, including, e.g., Linear Proxy Models such as LIME <ref type="bibr" target="#b16">[17]</ref>; salience mapping through occlusion <ref type="bibr" target="#b17">[18]</ref>; etc. 2. Explanations of representations by probing the role of individual layers or individual neurons, for example, to generate images that maximize the activation of a given neuron and can be seen as prototypical examples of a given class <ref type="bibr" target="#b18">[19,</ref><ref type="bibr" target="#b19">20]</ref>. 3. Systems that produce explanations.</p><p>Many works use explainable NLP in the third category to explain other models <ref type="bibr" target="#b20">[21]</ref>. However, the focus of this work is different: Instead, we aim to explore on a deeper level where conceptual information is encoded in distributed semantic representations and which part of the information might be the cause for downstream symbolic predictions. Much of the work on explanations in NLP, especially when it comes to visual explanations, either utilizes dimensionality reduction or highlights salient features on the scale of words in a text <ref type="bibr" target="#b6">[7]</ref>. However, it is not strictly necessary to reduce the dimensions of a word vector to visualize it. We tend to think of embeddings as vectors in high-dimensional space (e.g., 300 dimensions for GloVe embeddings) so that similar words are close to each other in this space. Yet a single word vector only consists of 300 numbers, while the numeric representation of an image might be made up of 6.000.000 numbers (a 1000px by 2000px RGB image). A word vector can easily be visualized as a kind of "barcode" of colors, with all 300 numbers arrayed in one dimension, the value of each number represented by the color. On this barcode, salient features (that is, the most critical dimensions in the vector) can easily be highlighted. This method has been used previously to produce visual explanations for NLP tasks by <ref type="bibr" target="#b21">[22]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Visualizing Word Embedding</head><p>The method presented in this paper is based on this same idea: Even though the individual dimensions of high-dimensional word embeddings do not obviously correspond to meaningful features to human eyes, they arguably still represent different features of what context a word usually appears in. By visualizing and analyzing these individual dimensions, we hypothesize that we can discover some clues as to which information is encoded where in the word embedding. A word vector can be visualized as a kind of "barcode" of colors -but to make it easier for the human eye to differentiate the individual dimensions, it might be helpful to visualize the same vector as an image, e.g., 300 numbers as a 300 pixel (15px by 20px) image. The main problem with this method is that humans will intuitively attribute meaning to the distance or closeness of individual pixels (e.g., "This area over there... "). This meaning, however, does not exist in reality, as the order of dimensions in the vector is random. Thus, we want to find a more meaningful organization of the dimensions of a word vector in a 2-dimensional space to visualize concept-specific areas of word embeddings as 2D images, so-called shadows.</p><p>To organize where in the image the dimensions of the vector should be placed, that is, which pixel corresponds to which dimension, a self-organizing map (SOM) <ref type="bibr" target="#b22">[23]</ref> presents an elegant solution. A SOM is trained on 𝑥 examples, each represented by 𝑦 features. The examples are then organized on a map. On these SOMs input vectors that are alike move closer together and ones that differ move away from each other by means of unsupervised clustering, i.e. learning vector quantization. When it comes to words represented by word embeddings, a naive approach would be to take for 𝑛 words as examples and their 𝑘-dimensional word embeddings as their feature representation, which would result in a SOM that can place words on a map based on their embeddings (here, the SOM would be a method of dimensionality reduction). Our use case is different: We want to organize the dimensions in the embedding on a map. Thus, the 𝑘 dimensions of the word vectors constitute the examples. The representations of these examples are the values of the dimension across the known words, meaning that each of the 𝑘 examples will be represented as a 𝑛-dimensional feature vector.</p><p>By training a SOM with as many neurons as the word embeddings have dimensions, it is possible to arrive at a model in which each dimension is recognized by exacly one of the neurons. Using this SOM, a word embedding (e.g., of the word 'the') can be visualized as an image with as many pixels as there are dimensions in the embedding. Each pixel is colored based on the value of the dimension that is associated with the corresponding neuron on the map.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Projecting Static Embeddings</head><p>We first used this method to investigate static word embeddings: We analyzed the 300-dimensional Glove embeddings provided by the open-source natural language library spaCy <ref type="bibr" target="#b23">[24]</ref>. The SOM is used to arrange the 300 dimensions of the word vectors in a (small) 2D image. Thus the SOM provides a map encoding which pixel in the image represents which of the dimensions of the word vector. The pixel is colored based on the value of the word vector in the associated dimension. This means that the SOM is not used later for predictions on new examples -it is only used once to construct this dimension-to-pixel map, and is not required to generalize at all, as there are no other possible examples beyond the known vector dimensions.</p><p>We trained a 15x20 SOM on the 300-dimensional GloVe embeddings of 10.000 unique English words to analyze static embeddings. There are around 170.000 words in the English language, but not all have pre-trained GloVe embeddings. We constructed a dataset of words by first collecting all lemmas included in WordNet <ref type="bibr" target="#b24">[25]</ref> through NLTK <ref type="bibr" target="#b25">[26]</ref>. From this set, we identified the 64466 words included in spaCy as GloVe embeddings. Out of these, we took a random sample of 10.000 words on which to train the SOM, as we discovered through several trials that a corpus of 10.000 words is appropriate in terms of error and training time.</p><p>The training parameters of the SOM were adjusted empirically until the trained model arrived at a one-to-one matching of dimensions to neurons in the SOM (meaning that the SOM was able to correctly identify each dimension, as each neuron was trained to respond to exactly one of the dimensions). The trained SOM consistently achieved a quantization error of approx. 0.0005 over 2000 training iterations.</p><p>Figure <ref type="figure" target="#fig_1">1</ref> shows the layout of the trained SOM and a number of examples of words represented with the resulting layout. It stands to reason that those dimensions with values far from zero (positive or negative) contribute the most information, while those close to zero are less important. Therefore values at 0 are colored white, negative values are red and positive values are blue. The distance map of the SOM shows that there is overall very little variation, except for a few outliers located in three regions. These same regions can be found in the images showing a number of example words, where these pixels stand out in red or blue.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.1.">Analysis of Individual Words from Static Embeddings</head><p>Most interesting about the SOM shown in Figure <ref type="figure" target="#fig_1">1a</ref> is that, while most of the neurons are relatively evenly spaced, there are several outliers -dimensions that are somehow more different from their neighbors than most. Most apparent are the pair of neurons corresponding to dimensions 140 and 105, the single neuron corresponding to dimension 86, and the cluster in the lower right corner. By comparing the map of the SOM in Figure <ref type="figure" target="#fig_1">1a</ref> to the examples in  Figure <ref type="figure" target="#fig_1">1b</ref>, it becomes clear that the outlier neurons in the SOM correspond to those dimensions with greater absolute values than most. Manual inspection of random words and their shadows (Figure <ref type="figure" target="#fig_1">1b</ref> depicts a sample) revealed that in words of a comparatively high register ('elucidate'), the right pixel (105) of the pair on the right stands out, and that in curse words, the left one (140) stands out strongly. We noticed that the two neurons 105 and 140, which appear as a pair in the SOM, never stand out togetherit is always either one or the other that appears dark red. In some words (like 'the'), neither stands out. Moreover, these two pixels often appear in dark red (negative value) but never in dark blue (positive value).</p><p>We inspected several synonyms of high-register words, such as 'explain' instead of 'elucidate', and found that neither pixel stood out for those. Furthermore, we also inspected several informal words such as 'hi', and found that in those, pixel 140 stood out almost as strongly as for curse words. We hypothesize that these dimensions capture the register of a word and act as opposites and calculated for each word 𝑥 in the corpus an 𝑟 so that:</p><formula xml:id="formula_0">𝑟 𝑥 = max(0, −𝑥 105 ) − max(0, −𝑥 140 )</formula><p>We then sorted all words by their 𝑟 value. Those words with very low r-value are where pixel 140 is dark red while pixel 104 is neutral, and vice versa. Table <ref type="table" target="#tab_0">1</ref> shows the ten words at either end of the list.</p><p>In the same way, we sorted the entire corpus of static embeddings based on the value at dimension 86 (a single pixel that stands out on the left of the map). This dimension apparently captures not the formality or register of words but instead seems to activate strongly if a word Comparison of words at either end of the spectrum of r-values in the dataset. Duplicates (due to spelling differences and compounds like 'ass-kisser') are omitted. On the left (right) are the ten words in the dataset with the highest (lowest) r-value. Looking at this sample, it appears clear that those words where dimension 140 stands out are of an higher register, while the other group is decidedly informal.</p><p>is likely to appear in a pornographic context. We tried the same with the cluster of pixels in the lower right of the map, both with individual pixels and combinations of the group. While there were some similarities, these were not as clear or meaningful as observed before (for example, sorting by 17 &amp; 9 produced many words related to Catholicism on the one end, including, e.g., 'antipope', 'tonsured', 'archpriest'; and words which appeared related to customer service at the other, e.g., 'management', 'service', 'customer'). We also sorted the corpus of all words by their value in other random dimensions that do not stand out on the SOM, to assess whether these would also appear to indicate similar semantic explanations for their values. However, the lists of words produced from sorting by other dimensions had no seeming correlations or common characteristics.</p><p>We investigated the register-pair 140 &amp; 105 further, testing what effect switching the value of the two dimensions might have on a word. When taking, for example, a word of high register such as 'elucidate', switching the respective values of dimensions 140 and 105 results in a new 300-dimensional vector that does not belong to any known word. However, searching through the corpus of all words for the most similar word to this switched vector (in terms of cosine similarity of the vectors) results in the word 'explain'. This connection holds for many words: applying the same technique to 'sufficient' results in 'enough', switching 'corrosion' leads to 'rust', 'covertly' to 'secret', 'occur' to 'happen', and so on -in a way, this can be used to find simpler synonyms.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Projecting Dynamic Embeddings</head><p>Analyzing the outlier dimensions in static word embeddings lead to some interesting insights. However, it seems that there are only very few dimensions that directly encode symbolic concepts such as register. For most other dimensions, the distance map of the SOM shows that there is very little variation, and their position in the resulting image is likely to be random. With dynamic embeddings, the word vector of a given word depends strongly on the context in which it appeared in the training text. Therefore, we examined whether visualizing these vectors might make it possible to investigate the results of word classification tasks such as POS tagging. If the same word can be used either as a verb or as a noun, somewhere in the vector, some information should be encoded as to which concept is more likely at hand in the given context. Our aim was that by visualizing dynamic word vectors with the SOM mapping, we might be able to find regions -that is, groups of dimensions -that are of particular importance for specific POS concepts.</p><p>SpaCy also provides a trainable part-of-speech (POS) tagging model, which consists of two layers: one takes a text and predicts dynamic, 96-dimensional embeddings for each token in the text, and the second predicts POS tags for these tokens based on the embeddings. We used these 96-dimensional word vectors to investigate dynamic embeddings. To train a SOM on static embeddings, we collected the pre-trained GloVe embeddings of a list of words. Due to the nature of dynamic embeddings, however, this is not possible here; an actual text is required since the conceptual representation of a word differs depending on its current context. Therefore, we used the Brown corpus <ref type="bibr" target="#b26">[27]</ref> and generated the dynamic embeddings from spaCy's pre-trained language model. As spaCy cannot process arbitrarily long texts, we only used the full sentences up to the 1.000.000th character. By removing punctuation and particle tokens, we obtained a dataset of 166.738 non-unique words with unique (dynamic) 96-dimensional word vectors.</p><p>First, we applied the same method as described above for static embeddings, training the SOM on the transposed matrix of word vectors. While there was some more variation in the SOM distance map, there did not appear to be any outliers as strong as in the static embedding map, and this method was not successful in differentiating between different POS concepts. Because of this, we took inspiration from two XAI approaches from computer vision research: the use of occlusion to analyze which features of the input representation are most important in the classification <ref type="bibr" target="#b17">[18]</ref>, and the generation of a prototypical image for a given class <ref type="bibr" target="#b19">[20]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.1.">Masking the Shadows</head><p>By occluding parts of the word vectors, we hoped to find out which of the dimensions where actually necessary to recognize a word as a particular POS class, thus reducing the vector only to the essential areas. First, we tried this with words that the model had classified as a noun. For this, dimensions of the vector were occluded (set to zero) one by one, at each step choosing the dimension of which the removal had the least negative impact on the probability of the vector being a noun. This was repeated until the probability dropped below 99% and then until it dropped below 50%. The first few removals increase the confidence in the noun classification instead of decreasing it. Testing this with a large number of words revealed that confidence in the noun classification usually stayed above 50% until only a few dimensions were left, sometimes as little as two. However, the remaining dimensions (visualized as pixels in the image) are not always the same, although many of the dimensions reappear over repeated tests.</p><p>We repeated the same process for all POS classes that the spaCy model identifies, which are based on the Penn Treebank classes <ref type="bibr" target="#b27">[28]</ref>. The results are that most of the dimensions in a word vector are irrelevant for it to be classified as the same POS with above 50% confidence. This does not change if all but a few dimensions are reduced, when almost the entire vector is occluded, the prediction changes to a different class. Interestingly, NN (singular noun) appears to be the default classification: a vector with only 0s is classified as a noun, albeit with low confidence. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.2.">Projecting Prototypical Shadows</head><p>Next, we systematically tested the outcome of occlusion by reducing the vectors of all the tokens in the dynamic embedding corpus that the model had originally classified with high confidence, that is greater than 99%, until confidence dipped below 50%. For each POS class, we calculated the average of these reduced vectors. Figure <ref type="figure" target="#fig_2">2</ref> shows the resulting images for a selection of classes. In some cases, such as for DT (determiners), the result of the reduction almost always leaves the same dimension unoccluded, leading to an average image where only one or few dimensions appear very strongly. However, for others such as NN, there are many different possible results of the reduction. Thus, the average image is more translucent, and does not show a specific region. Classes such as MD, VB, or VBN (different verb forms) seem to be concentrated around different regions. It appears that for POS classes that can be considered conceptually more precise, there are only a few dimensions that are often or always very important for the classification. This is especially the case for those classes with a limited number of possible words, or which are marked by their form such as comparative or superlative adjectives. In contrast, concepts like nouns or verbs are more difficult to grasp.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.3.">Analysis of POS-tagging from Dynamic Embeddings</head><p>It appears that for POS classification, it is possible to identify areas in the vector images that are most important for the model to identify different POS classes. Therefore, we used these visualizations to investigate a problem that we had come up against repeatedly in prior work: models that are fine-tuned from pre-trained embeddings tend to struggle with very domainspecific language that differs from more standard texts. In particular, we have often struggled with the problem that recipe texts employ a kind of language that makes it difficult to identify the main verb of a sentence. This can be due to, for example, words being used as both verbs and nouns (e.g. 'juice'), other words being left out (e.g. "chop tomatoes" instead of "chop the tomatoes"), missing punctuation, etc. First, we decided to investigate one particular word, which we had stumbled upon in a previous study as an example that spaCy's POS tagging model misclassified. We looked at three sentences containing the word 'garlic':</p><p>(1) Add the garlic to the pan.</p><p>(2) Add cauliflower and garlic mixture to the pot, mixing carefully to combine.</p><p>(3) You have to garlic and salt the food.</p><p>In sentence (1), 'garlic' is correctly classified as a noun. In sentence (2), however, it is incorrectly classified as a verb with a probability of 49%, while the noun tag only has a probability of 46%. Sentence (3) is an example in which 'garlic' is used as a verb and correctly classified as such.</p><p>Figure <ref type="figure" target="#fig_4">3</ref> shows the visualizations of the three different vectors representing the word garlic in these three different sentences. As the confidence for the second version is already quite low, and reducing the vectors would lead to different dimensions being left unoccluded, we did not reduce the vectors here. Instead, we masked most of the image, leaving those dimensions highlighted that were most often (across the whole corpus) unoccluded at 50% confidence.</p><p>It appears that the vector of the noun use of garlic, which was incorrectly classified as a verb (sentence (2), 4a), most strongly differs from the correctly classified noun (sentence (1), 3b) in the pixel on the far left at (0,2) and the one on the right at <ref type="bibr" target="#b8">(9,</ref><ref type="bibr" target="#b1">2)</ref>. Those two pixels are the same color in the vector representing garlic as a verb (sentence (3), 4b), opposite colors from the noun in 3b. Thus, we inverted these two pixels by multiplying their respective values with -1. Figure <ref type="figure" target="#fig_4">3d</ref> depicts the result of these inversions. We used this 'corrected' vector as input for spaCy's POS tagging model. As expected, the model now classifies this vector as a noun, with a confidence of 88%. This means that we were able to visually identify the exact dimension that was the reason for the incorrect classification of this token.</p><p>Next, we tried a slightly different approach with another problem, where a verb was incorrectly classified as an adjective, as seen in 4. As noun seems to be the default POS class, reducing the vectors of noun tokens leaves only very few dimensions unoccluded at 50%, and comparing them to the conceptual shadow shown in Figure <ref type="figure" target="#fig_2">2</ref> is not very helpful. However, this is not a problem for adjectives. We, therefore, considered two sentences:</p><p>(4) Heat oil in a deep frying pan or wok until very hot.</p><p>(5) Heat some vegetable oil in the same frying pan you used before.</p><p>The sentence (4), the first token 'heat' was incorrectly classified as an adjective instead of a verb. Therefore, we looked at the reduced vector of the token, as well as the conceptual shadows of the verb and adjective classes. Those were compared to a vector from sentence <ref type="bibr" target="#b4">(5)</ref>, where the same word, 'heat', in the same position in the sentence, was classified as a verb correctly. Figure <ref type="figure" target="#fig_5">4</ref> shows these images. The two vectors that both represent the word 'heat' clearly share some features. Interestingly, many of the dimensions that are left in the reduced vectors are similar in both versions -clearly, small changes are enough to switch the classification from verb to adjective. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Discussion</head><p>The nature of this work is rather exploratory. The results of our experiments shed some light on how the meaning of linguistic concepts is encoded in high-dimensional word embeddings, which until now have been a black box in NLP that was quite securely closed. Addressing the cognitive elephant in the room, it is clear that human cognition is based on combinations of statistical processes together with increasingly symbolic generalizations over the extracted patterns. Some well-known phenomena such as prototypicality effects or radial categories <ref type="bibr" target="#b28">[29]</ref> are out of the scope of most symbolic approaches, yet become quite easy to see in the conceptual shadows shown herein -where we can compare the shadows of very prototypical nouns and verbs to ones that are less nouny or verby. We are not aware of many other works which use visualizations of word vectors in their entirety, apart from the "bar code"-like images described in the beginning. By organizing the dimensions of these vectors on a map by training a SOM, we were able to identify areas of interest, as well as dimensions that appear to "belong together" as the pair of dimensions that seems to encode the register of a word. Together with the regions decoding POS, we can now form rudimentary ensembles of shadows that encode, for example, high-register nouns or vernacular verbs as depicted in Figure <ref type="figure" target="#fig_6">5</ref>.</p><p>So far, this method has allowed us to identify small areas which appear to have recognizable tasks in the semantic representation, and to point out which of the dimensions of a word vector might be responsible e.g., for an incorrect classification. However, this in turn poses the question of why the dimension in question was "wrong" in the first place. To investigate this, we have to follow this lead one step deeper, and investigate what resulted in this particular weight when the vector was generated from the input text. One application for this work is to use it as a starting point from which to analyze downstream errors in NLP tasks and explain their origins.</p><p>It is important to point out that any conclusion drawn from these visualizations is only ever related to the specific set of vectors on which the SOM was trained. A different kind of static embedding than GloVe might very well result in a very different map, with different outlier dimensions which might not appear to hold similar meaning to the ones we found here. This, however, is more feature than bug in our minds, as we visualize how a specific sub-symbolic system encodes conceptual dimension, which is -by its very nature -based on its training. In spite of the current limitations of the work presented above, we find that mapping the individual dimensions of word embeddings as a 2D image makes it possible to gather fascinating insights into the internal makeup of distributed semantic representations. We hope that this kind of low-level analysis of embeddings can serve as a starting point to gain deeper understanding of neural networks used in NLP and other symbolic classification tasks.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>(a) Visualization of the distance map of the SOM trained on static word embeddings. The map is comprised of 300 neurons, organized in a 15x20 map. Each neuron represents exactly one of the 300 dimensions of the embedding; overlaid in red are the numbers of the dimensions as they are ordered in the word vectors. (b) A number of example word vectors visualized as images based on the SOM organization. Values around 0 are white, negative numbers red, and positive numbers blue. It appears that the most yellow dimensions identified in the SOM also have among the highest absolute values.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Visualization of static word vectors.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Average Shadows of POS Concepts</figDesc><graphic coords="9,110.13,84.18,375.02,330.81" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head></head><label></label><figDesc>(a) Used as a noun, incorrectly classified as a verb. (b) Used as a noun, correctly classified as a noun. (c) Used as a verb, correctly classified as a verb. (d) Same vector as in (a) with two inverted values.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Four different vector representations of the word 'garlic', overlaid to highlight which pixels are especially relevant for classification as nouns. Here, we use a purple overlay over those dimensions which did not appear in at least 1/4 of minimal nouns.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Two different vector representations of the word 'Heat'. The three images show the entire embedding, then the same reduced to 50% confidence, and then the prototypical concept shadow of the POS class it was classified as.</figDesc><graphic coords="11,99.71,204.30,395.86,112.76" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: The general picture, showing a set of tokens in a given context, their conceptual types and the corresponding conceptual shadows (as illustrations).</figDesc><graphic coords="12,130.96,84.19,333.36,191.31" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc></figDesc><table><row><cell>word</cell><cell>140</cell><cell>105</cell><cell>r</cell><cell>word</cell><cell>140</cell><cell>105</cell><cell>r</cell></row><row><cell>'coercive'</cell><cell cols="3">0.89 -3.15 3.15</cell><cell cols="4">'fucking', -4.21, -0.04, -4.17</cell></row><row><cell>'plurality'</cell><cell cols="3">0.55 -3.12 3.12</cell><cell>'ass',</cell><cell cols="3">-4.14, -0.03, -4.11</cell></row><row><cell>'minimise'</cell><cell cols="3">0.31 -3.11 3.11</cell><cell>'fuck',</cell><cell cols="2">-3.93, 0.08,</cell><cell>-3.93</cell></row><row><cell>'deleterious'</cell><cell cols="3">0.48 -3.10 3.10</cell><cell>'wanna',</cell><cell cols="3">-3.94, -0.02, -3.91</cell></row><row><cell>'lessening'</cell><cell cols="3">0.20 -3.07 3.07</cell><cell>'song',</cell><cell cols="2">-3.90, 0.63,</cell><cell>-3.90</cell></row><row><cell cols="4">'predetermined' 0.05 -3.05 3.05</cell><cell>'lol',</cell><cell cols="2">-3.87, 0.27,</cell><cell>-3.87</cell></row><row><cell>'concomitant'</cell><cell cols="3">0.90 -3.05 3.05</cell><cell>'bitch',</cell><cell cols="2">-3.85, 0.02,</cell><cell>-3.85</cell></row><row><cell>'societal'</cell><cell cols="3">0.04 -3.03 3.03</cell><cell>'cute',</cell><cell cols="2">-3.78, 0.38,</cell><cell>-3.78</cell></row><row><cell>'constraining'</cell><cell cols="3">0.62 -3.03 3.03</cell><cell>'ya',</cell><cell cols="2">-3.78, 0.29,</cell><cell>-3.78</cell></row><row><cell>'quantifiable'</cell><cell cols="3">0.80 -3.04 3.02</cell><cell>'movie',</cell><cell cols="2">-3.74, 0.40,</cell><cell>-3.74</cell></row><row><cell cols="4">(a) Highest r-values</cell><cell></cell><cell cols="3">(b) Lowest r-values</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">A linguistic ontology for the semantic web</title>
		<author>
			<persName><forename type="first">S</forename><surname>Farrar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Langendoen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">GLOT International</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page" from="97" to="100" />
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">The ontowordnet project: Extension and axiomatization of conceptual relations in wordnet</title>
		<author>
			<persName><forename type="first">A</forename><surname>Gangemi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Navigli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Velardi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE</title>
				<editor>
			<persName><forename type="first">R</forename><surname>Meersman</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Z</forename><surname>Tari</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><forename type="middle">C</forename><surname>Schmidt</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin Heidelberg; Berlin, Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2003">2003</date>
			<biblScope unit="page" from="820" to="838" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Linginfo: Design and applications of a model for the integration of linguistic information in ontologies</title>
		<author>
			<persName><forename type="first">P</forename><surname>Cimiano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Buitelaar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Frank</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Racioppa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sintek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kiesel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Romanelli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Loos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Declerck</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Engel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Sonntag</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Micelli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Porzel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of OntoLex at LREC</title>
				<meeting>of OntoLex at LREC</meeting>
		<imprint>
			<publisher>ELRA</publisher>
			<date type="published" when="2006">2006</date>
			<biblScope unit="page" from="28" to="32" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Corrado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dean</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1310.4546</idno>
		<title level="m">Distributed representations of words and phrases and their compositionality</title>
				<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Glove: Global vectors for word representation</title>
		<author>
			<persName><forename type="first">J</forename><surname>Pennington</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Socher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)</title>
				<meeting>the 2014 conference on empirical methods in natural language processing (EMNLP)</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="1532" to="1543" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Explaining explanations: An overview of interpretability of machine learning</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">H</forename><surname>Gilpin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Bau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">Z</forename><surname>Yuan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bajwa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Specter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of DSAA 2018</title>
				<meeting>of DSAA 2018</meeting>
		<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="80" to="89" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Explainability for natural language processing</title>
		<author>
			<persName><forename type="first">M</forename><surname>Danilevsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Dhanorkar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Popa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Qian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Xu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of KDD 2021</title>
				<meeting>of KDD 2021</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="4033" to="4034" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Towards a standard upper ontology</title>
		<author>
			<persName><forename type="first">I</forename><surname>Niles</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Pease</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of FOIS 2021</title>
				<meeting>of FOIS 2021<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2001">2001</date>
			<biblScope unit="page" from="2" to="9" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<author>
			<persName><forename type="first">C</forename><surname>Masolo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Borgo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gangemi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Guarino</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Oltramari</surname></persName>
		</author>
		<title level="m">Wonderweb deliverable d18, ontology library (final), ICT project</title>
				<imprint>
			<date type="published" when="2003">2003</date>
			<biblScope unit="page">31</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">What say you: An ontological representation of imperative meaning for human-robot interaction</title>
		<author>
			<persName><forename type="first">R</forename><surname>Porzel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">S</forename><surname>Cangalovic</surname></persName>
		</author>
		<ptr target=".org" />
	</analytic>
	<monogr>
		<title level="m">Proc. of JOWO 2020</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<meeting>of JOWO 2020</meeting>
		<imprint>
			<publisher>CEUR-WS</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="volume">2708</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Understanding the semantic web through descriptions and situations</title>
		<author>
			<persName><forename type="first">A</forename><surname>Gangemi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Mika</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the ODBASE Conference</title>
				<meeting>the ODBASE Conference</meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1810.04805</idno>
		<title level="m">Bert: Pre-training of deep bidirectional transformers for language understanding</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Contextual string embeddings for sequence labeling</title>
		<author>
			<persName><forename type="first">A</forename><surname>Akbik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Blythe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Vollgraf</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of COLING 2018</title>
				<meeting>of COLING 2018</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="1638" to="1649" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">Improving language understanding by generative pre-training</title>
		<author>
			<persName><forename type="first">A</forename><surname>Radford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Narasimhan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Salimans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Language models are unsupervised multitask learners</title>
		<author>
			<persName><forename type="first">A</forename><surname>Radford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Child</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Luan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Amodei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">OpenAI blog</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page">9</biblScope>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Vaswani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Shazeer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Parmar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Uszkoreit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">N</forename><surname>Gomez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Kaiser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Polosukhin</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1706.03762</idno>
		<title level="m">Attention is all you need</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">why should i trust you?&quot;: Explaining the predictions of any classifier</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">T</forename><surname>Ribeiro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Guestrin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of KDD 2016</title>
				<meeting>of KDD 2016<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="1135" to="1144" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Visualizing and understanding convolutional networks</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">D</forename><surname>Zeiler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Fergus</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Computer Vision -ECCV 2014</title>
				<editor>
			<persName><forename type="first">D</forename><surname>Fleet</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Pajdla</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">B</forename><surname>Schiele</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Tuytelaars</surname></persName>
		</editor>
		<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="818" to="833" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<title level="m" type="main">Synthesizing the preferred inputs for neurons in neural networks via deep generator networks</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">M</forename><surname>Nguyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Dosovitskiy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Yosinski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Brox</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Clune</surname></persName>
		</author>
		<idno>CoRR abs/1605.09304</idno>
		<ptr target="http://arxiv.org/abs/1605.09304.arXiv:1605.09304" />
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Understanding neural networks via feature visualization: A survey</title>
		<author>
			<persName><forename type="first">A</forename><surname>Nguyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Yosinski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Clune</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Explainable AI: interpreting, explaining and visualizing deep learning</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="55" to="76" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<author>
			<persName><forename type="first">D</forename><surname>Doran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Schulz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">R</forename><surname>Besold</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1710.00794</idno>
		<title level="m">What does explainable ai really mean? a new conceptualization of perspectives</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Visualizing and understanding neural models in nlp</title>
		<author>
			<persName><forename type="first">J</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Hovy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Jurafsky</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2016 north american chapter of the association for computational linguistics</title>
				<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="681" to="691" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">The self-organizing map</title>
		<author>
			<persName><forename type="first">T</forename><surname>Kohonen</surname></persName>
		</author>
		<idno type="DOI">10.1109/5.58325</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE</title>
				<meeting>the IEEE</meeting>
		<imprint>
			<date type="published" when="1990">1990</date>
			<biblScope unit="volume">78</biblScope>
			<biblScope unit="page" from="1464" to="1480" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<title level="m" type="main">spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing</title>
		<author>
			<persName><forename type="first">M</forename><surname>Honnibal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Montani</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note>To appear</note>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Wordnet: A lexical database for english</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">A</forename><surname>Miller</surname></persName>
		</author>
		<idno type="DOI">10.1145/219717.219748</idno>
		<idno>doi:</idno>
		<ptr target="10.1145/219717.219748" />
	</analytic>
	<monogr>
		<title level="j">Commun. ACM</title>
		<imprint>
			<biblScope unit="volume">38</biblScope>
			<biblScope unit="page" from="39" to="41" />
			<date type="published" when="1995">1995</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<monogr>
		<title level="m" type="main">Natural language processing with Python: analyzing text with the natural language toolkit</title>
		<author>
			<persName><forename type="first">S</forename><surname>Bird</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Klein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Loper</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2009">2009</date>
			<publisher>O&apos;Reilly Media, Inc</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<monogr>
		<title level="m" type="main">Brown Corpus Manual</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">N</forename><surname>Francis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Kucera</surname></persName>
		</author>
		<ptr target="http://icame.uib.no/brown/bcm.html" />
		<imprint>
			<date type="published" when="1979">1979</date>
			<pubPlace>Providence, Rhode Island, US</pubPlace>
		</imprint>
		<respStmt>
			<orgName>Department of Linguistics, Brown University</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Technical Report</note>
</biblStruct>

<biblStruct xml:id="b27">
	<monogr>
		<title level="m" type="main">Building a large annotated corpus of english: The penn treebank</title>
		<author>
			<persName><forename type="first">M</forename><surname>Marcus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Santorini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Marcinkiewicz</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1993">1993</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Cognitive representations of semantic categories</title>
		<author>
			<persName><forename type="first">E</forename><surname>Rosch</surname></persName>
		</author>
		<idno type="DOI">10.1037/0096-3445.104.3.192</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Experimental Psychology: General</title>
		<imprint>
			<biblScope unit="volume">104</biblScope>
			<biblScope unit="page" from="192" to="233" />
			<date type="published" when="1975">1975</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
