<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Supervised Visualization of Vocabulary Knowledge towards Explainable Support of Second Language Learners</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Yo</forename><surname>Ehara</surname></persName>
							<email>ehara.yo@sist.ac.jp</email>
							<affiliation key="aff0">
								<orgName type="institution">Shizuoka Institute of Science and Technology</orgName>
								<address>
									<addrLine>2200-2</addrLine>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="institution">Toyosawa</orgName>
								<address>
									<settlement>Fukuroi</settlement>
									<region>Shizuoka</region>
									<country key="JP">Japan</country>
								</address>
							</affiliation>
							<affiliation key="aff2">
								<orgName type="institution">Stanford University</orgName>
								<address>
									<settlement>Palo Alto</settlement>
									<region>California</region>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Supervised Visualization of Vocabulary Knowledge towards Explainable Support of Second Language Learners</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">3BDAF726AFD77CEA7D9AB61AE46897FC</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T15:38+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In second language learning, it is crucial to identify gaps in knowledge of the language between second language learners and native speakers. Such a gap exists even when learning a single word in a second language. As the semantic broadness of a word differs from language to language, language learners must learn how broadly a word can be used in a language. For example, certain languages use different words for "period" in "a period of time" or "period pains" yet both are nouns. Learners whose native languages are such languages typically have only partial knowledge of a word, even though they think they know the word "period," producing a gap between them and native speakers. Language learners typically want explanations for these word usage differences, which even native speakers find it difficult to explain and find it costly to annotate. To support language learners in noticing these challenging differences easily and intuitively, this paper proposes a novel supervised visualization of the usages of a word. In our method, the usages of an inputted word in large corpora written by native speakers are visualized, taking the semantic proximity between the usages into account. Then, for the single inputted word, our method makes a personalized prediction of word usages that each learner may know, based on his/her results of a quick vocabulary test, which takes approximately 30 minutes. The experiment results show that our method produces better usage frequency counts than raw usage frequency counts in predicting vocabulary test responses, implying that word usage prediction is accurate.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Introduction</head><p>Acquiring a second language requires repeated efforts to narrow the gap between language learners' knowledge of the language and that of native speakers. Making such gaps intuitively understandable greatly helps language learners selfteach the language and also helps researchers build effective language tutoring systems. Some gaps such as vocabulary size, or time spent in language learning are intuitively easy to understand and, hence, are well studied. However, in second language learning, most gaps are related to meaning and semantics and are inherently abstract. Hence, visualizing these gaps is essential to make these gaps intuitively understandable.</p><p>The broadness of a word, or how a word can be used in the language to express different concepts, is one such abstract gap <ref type="bibr" target="#b26">(Read 2000)</ref>. Because the meaning of a word differs from language to language, when learning a word in a second language, there typically exists a gap between what learners think the word means and how the word is actually used in the language. Polysemous words are examples that are easy to understand: "book" can mean an item associating with reading, or it can mean to make a reservation. Other than these examples, to which the part-of-speech tagging techniques in natural language processing (NLP) seem applicable, some examples are more subtle: some languages always use different words for "time" in "in a short time" or "for a time," in which the word "time" refers to a period, and "time and space" or "time heals all wounds," in which "time" is used as an abstract concept. In another example, many languages use different words for "period" in "a period of time", and "period" in "period pains". In this way, the granularity of the word's senses should be distinguished for second language acquisition, as it varies from word to word.</p><p>Polysemous words encode different concepts in one word: hence, they have been one of the central topics in knowledge engineering. A substantial amount of work has been conducted to automatically recognize polysemous words for practical applications by using machine learning, including those in the previous AAAI-MAKE workshops <ref type="bibr" target="#b24">(Ramprasad and Maddox 2019;</ref><ref type="bibr" target="#b9">Hinkelmann et al. 2019;</ref><ref type="bibr" target="#b13">Laurenzi et al. 2019</ref>). However, even among few such applications for second language acquisition <ref type="bibr" target="#b8">(Heilman et al. 2007;</ref><ref type="bibr" target="#b4">Dias and Moraliyski 2009)</ref> in the artificial intelligence (AI) community, the challenging problem of different granularity of the word's senses in second language acquisition has not been addressed. In second language acquisition, as learners are typically not linguistic experts, i.e., novices, hence, systems to support their learning need to be intuitively understandable. Our goal is to make the gaps among word usages intuitively understandable, even for novice language learners.</p><p>To this end, this paper proposes a novel supervised visualization method for word usages to assist in learning the different usages of a word. Our method first searches all usages of the target word in a large corpus written by native speakers. Then, it calculates the vector representation of each usage, or occurrence, of each word by using a contextualized word embedding method <ref type="bibr" target="#b3">(Devlin et al. 2019)</ref>. Contextualized word embedding methods <ref type="bibr" target="#b23">(Peters et al. 2018;</ref><ref type="bibr" target="#b3">Devlin et al. 2019)</ref> are recently proposed methods to embed each occurrence of a word, capturing the context of each usage of the word.</p><p>Then, our method is trained to visualize the contextualized word embedding vectors by plotting each usage as a point in a two-dimensional space. Unlike a typical visualization method that merely projects the vectors to a twodimensional space, our method is trained to fit and visually explain a given supervision dataset. This means that the same vectors are visualized in different ways if the supervision dataset differs. Here, the supervisions are a vocabulary test result dataset that consists of a matrix-format data, recording which learner answered correctly/incorrectly to which word question. The method visualizes the areas a learner user may know by classifying each usage point in the visualization into known/not known to the learner. This classification is conducted in a personalized manner because learners' language skills and specialized fields are different. The learner only needs to take a 30-minute vocabulary test for this purpose.</p><p>Figure <ref type="figure" target="#fig_0">1</ref> shows an example visualization using our method. "To haunt" has two different meanings in English, the first being "to chase" and the other "to curse," or to be affected by ghosts or misfortune. Each point shows the usage of the word in a corpus written by native speakers. The differences in point colors indicate whether they are predicted to be known to the learner. The right side of the figure, within the dotted curve, is predicted to be known to the learner. In this way, our method visualizes the semantic area the learner knows.</p><p>Our contribution is as follows:</p><p>• For second language vocabulary learning, we propose a novel supervised visualization model that captures word broadness via a personalized prediction of learner's knowledge of usages.</p><p>• As our visualization uses a vocabulary test result dataset as supervisions, learners can understand which usage of the inputted word is predicted to be known/not known to him/her. Unlike previous methods that output automatic explanation of machine-learning models, our method is much more intuitive and novice-friendly for language learners in the sense that language learners do not need to know about machine learning models.</p><p>• We evaluated our method in terms of predictive accuracy of vocabulary test result dataset and achieved better results compared to baselines.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Related Work</head><p>Explainable machine learning studies</p><p>While deep learning-based methods outperformed conventional machine learning methods such as support vector machines (SVMs) in many tasks, parameters of deep learning methods are typically more difficult to interpret compared to those in conventional models. To this end, in the machine learning and artificial intelligence community, a number of methods have been proposed to extract explanations from trained machine-learning models, or training models taking explainability into accounts (Ribeiro, Singh, and Guestrin 2016; Koh and Liang 2017; Lundberg and Lee 2017; Ribeiro, Singh, and Guestrin 2018). However, the purpose of these methods is to explain machine-learning models to help machine-learning engineers and researchers in understanding the models. Obviously, second language learners are usually not machinelearning engineers and researchers. Therefore, methods of these studies have different purposes, and it is difficult to apply these methods to help their understanding of the models. Language learners are typically even not interested in the models. Rather, learners' interests reside in understanding their current learning status and what they should learn to improve it. Hence, to meet learners' needs, a model is desirable for a learner to see his/her current learning status and what he/she needs to learn in the near future.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Word Embedding Visualization Studies</head><p>Word embedding techniques are techniques that have been extensively studied in natural language processing (NLP) to obtain vector representations of words typically using neural networks. The word2vec is a seminal paper in these lines of studies <ref type="bibr" target="#b19">(Mikolov et al. 2013)</ref>. The following papers report improvement of their accurateness to represent words as vectors, typically by comparing the distances between word vectors with human judgments on semantic proximity between words <ref type="bibr" target="#b22">(Pennington, Socher, and Manning 2014)</ref>.</p><p>Early studies on word embeddings address how to make one vector for each word. As one vector representation is modeled to point one meaning, this limitation is obviously problematic to deal with polysemous words. Several previous studies tackled this problem and proposed methods to estimate the number of a word's meanings and to estimate an embedding for each meaning of the word (Athiwaratkun, Wilson, and Anandkumar 2018). However, recently, contextualized word embeddings <ref type="bibr" target="#b23">(Peters et al. 2018;</ref><ref type="bibr" target="#b3">Devlin et al. 2019</ref>) became quickly popular. With these methods, we can obtain an embedding for each usage, or occurrence, of a word, considering the context of the occurrence of the word in a running sentence. These methods can also be seen as a method to estimate word embeddings for polysemous words, with an extreme assumption that each occurrence of a word has different meanings. As contextualized word embeddings are shown to be successful in many tasks, in current NLP, the former strategy to estimate both the number of meanings of a word and an embedding for each meaning is employed only when it is necessary.</p><p>Following the rise of word embedding techniques, visualization studies were propose to visualize word embeddings. The study by <ref type="bibr">(Smilkov et al. 2016</ref>) simply reported that their development of a tool to visualize embeddings for different words. The study by <ref type="bibr" target="#b15">(Liu et al. 2017</ref>) introduces applying visualization of word embeddings to analyze semantic relationships between words. Both paper deals with principal component analysis (PCA) and t-SNE <ref type="bibr" target="#b18">(Maaten and Hinton 2008)</ref> for visualization. To our knowledge, we are the first to visualize contextualized word embeddings, in which each occurrence of a word, rather than a word, is visualized, with a practical purpose on language education.</p><p>In addition to the visualization, our method can also predict the usages that each learner is familiar/unfamiliar with, in a personalized manner, when vocabulary test result data of dozens of learners are provided, such as the data in <ref type="bibr" target="#b6">(Ehara 2018)</ref>. While there exist previous studies <ref type="bibr" target="#b6">(Ehara 2018;</ref><ref type="bibr" target="#b14">Lee and Yeung 2018;</ref><ref type="bibr" target="#b30">Yeung and Lee 2018)</ref> for predicting the words that each learner is familiar/unfamiliar with using such data by using simple machine-learning classification, our method tackles a more difficult problem that deals with predicting which usages of a word is known/unknown to the learner.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Concordancer studies</head><p>While our proposed method is novel as a visualization, software tools that search the usages of an inputted word for educational purposes and display them itself are not novel: such software is known as concordancers. Concordancers target learners, educators, and linguists as primary users. They are interactive software tools that retrieve all usages of the inputted word in a large corpus and display the list of the usages, each of which comes with the surrounding word patterns <ref type="bibr">(Hockey and Martin 1987)</ref>. Concordancers were also studied to support translators, who are second language learners in many cases <ref type="bibr">(Wu et al. 2004;</ref><ref type="bibr" target="#b10">Jian, Chang, and Chang 2004;</ref><ref type="bibr" target="#b17">Lux-Pogodalla, Besagni, and Fort 2010)</ref>.</p><p>Figure <ref type="figure" target="#fig_1">2</ref> shows a screenshot from a current concordancer<ref type="foot" target="#foot_0">1</ref> . In this screenshot, the word "book" is searched. Then, the list of word usages is shown. Each word usage comes with surrounding words so that language learners can see how the word is used. While the list is sorted in alphabetical order of the previous word, we can see that the list shows "a book" and "the book" in totally different positions and are not helpful for language learners. While some concordancers support listing the usage of "book" as nouns by attaching texts with part-of-speeches in advance, this is not helpful to see the different usages of the word when the part-of-speeches of the usages are identical. For example, the word "bank" have polysemous meanings sharing the same part-of-speech: one as financial organizations, and another as embankments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Personalized complex word identification studies</head><p>In this study, a part of our goals is to identify complex usages of a word in a running text. In other words, for one word, one usage of the word in running text is complex for a learner, and another usage of the word is not. There are previous studies that identify complex words in a personalized manner in the NLP literature <ref type="bibr" target="#b5">(Ehara et al. 2012;</ref><ref type="bibr" target="#b14">Lee and Yeung 2018)</ref>. These studies predict the words that each learner knows based on each learner's result of a short vocabulary test, which a learner typically takes 30 minutes to solve. Also, there are also many studies that identify complex usages in a non-personalized manner, as summarized in <ref type="bibr" target="#b20">(Paetzold and Specia 2016;</ref><ref type="bibr" target="#b31">Yimam et al. 2018)</ref>.</p><p>However, to our knowledge, the task of identifying complex usages in a personalized manner is novel. Our method is also novel in that it trains how to visualize the usages so that learners can visually understand the usage differences by using the learners' vocabulary test data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Preliminary System and Experiments</head><p>Before entering the technical details of our method described in the Proposed Method section, we first show the preliminary system and some experiment results to introduce the motivation of the proposed method.</p><p>The preliminary system visualizes contextualized word embeddings by using the conventional visualization of principal component analysis (PCA). Figure <ref type="figure" target="#fig_2">3</ref> shows the layout  of the preliminary system. Once a user provides a word to the system, it automatically searches the word in the corpus in a similar way to typical concordancers. Unlike concordancers, the system has a database that stores contextualized word embeddings for each usage or occurrence of each word in the corpus. We used half a million sentences from the British National Corpus (BNC Consortium 2007) as the raw corpus. We built the database by applying the bert-baseuncased model of the PyTorch Pretrained the BERT project 2 <ref type="bibr" target="#b3">(Devlin et al. 2019</ref>) to the corpus. We used the last layer, which was more distant from the surface input, as the embeddings.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Choice of dimension reduction methods</head><p>Principal component analysis (PCA) and t-SNE <ref type="bibr" target="#b18">(Maaten and Hinton 2008)</ref> are famous dimension reduction methods, and t-SNE is notable for its intuitiveness and well clustered 2 https://github.com/huggingface/pytorch-pretrained-BERT points <ref type="bibr" target="#b18">(Maaten and Hinton 2008)</ref>.</p><p>Knowing t-SNE, we did not employ t-SNE for visualization for the following reasons: First, in our visualization, the distances between usage points are important. While t-SNE often produces intuitive clusters between data points, the distance between points in the visualization is complicated compared to those of PCA. Hence, to interpret distances between points, PCA is This is stated in the original t-SNE <ref type="bibr" target="#b18">(Maaten and Hinton 2008)</ref> paper. Moreover, many blog posts such as<ref type="foot" target="#foot_1">3</ref> for engineers address this fact to encourage the proper understanding of t-SNE. For these reasons, we employed PCA for the basis of our visualization.</p><p>Second, even if the data to visualize is fixed, t-SNE returns different results depending on its hyperparameter called perplexity. In contrast, PCA returns the same results if the data to visualize is fixed. This dependence on the hyperparameter is elaborated in the original t-SNE paper <ref type="bibr" target="#b18">(Maaten and Hinton 2008)</ref> in the first place. We can also find some blog posts targeting engineers that advocates to carefully set the perplexity parameter such as<ref type="foot" target="#foot_2">4</ref> . Various results on fixed data can be useful when the data is difficult to be pre-processed so that the following dimension-reduction methods are easy to handle. However, in this study, the data to be visualized are embeddings vectors; hence, the data can be easily pre-processed before we feed them into the data. Hence, for the purpose of this study, the feature that the results vary on fixed data is unlikely to be useful. Rather, this may possibly complicate the interpretation of the visualization.</p><p>Third, practically, t-SNE is computationally heavy compared to PCA. Computing a t-SNE visualization involves calculations for every pair of the given data points. While how to deal with this heavy computational complexity is addressed in studies such as <ref type="bibr" target="#b28">(Tang et al. 2016)</ref>, practically, t- SNE is usually computationally heavy when compared to PCA. Strictly speaking, PCA has a similar complexity as it involves the computation of singular values and vectors in singular value decomposition (SVD). However, the calculation of SVD has a number of applications other than PCA-based visualization, sophisticated calculation methods for large data were previously proposed <ref type="bibr" target="#b7">(Halko et al. 2011)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Preliminary System by using PCA</head><p>We built a preliminary system and conducted some experiments to see how contextualized word embedding vectors are plotted in the system. Figure <ref type="figure" target="#fig_3">4</ref> depicts such an example of searching for the word book. Users can directly type the word in the textbox shown at the top of Figure <ref type="figure" target="#fig_3">4</ref>. Below is the visualization of the usages found and their list. Each dark-colored point is linked to each usage. Two dark colors are used to color each usage point according to the results of a Gaussian mixture model (GMM) clustering with 2 components, as this value was reported to work well (Athiwaratkun, Wilson, and Anandkumar 2018). The light-red colored point is the probe point: the usages are listed in the nearest order of the probe point. No usage is linked to the probe point. Users can freely and interactively drag and move the probe point to change the list of usages below the visualization. Each line of the list shows the usage identification number and the surrounding words of the usage, followed by a checkbox to record the usage so that learners can refer to it later. In Figure <ref type="figure" target="#fig_3">4</ref>, the probe point is on the left part of the visualized figure. In the first several lines of the list, the system successfully shows the usages of the word book as a publication. In contrast, Figure <ref type="figure" target="#fig_4">5</ref> depicts the case in which the users drag the probe point from the left to the right of the visualization. The first several lines of the list show the usages of the word book, which means to reserve. We can see that the words surrounding the word book vary: merely focusing on the surrounding words, such as "to" before book, cannot distinguish the usages of book, which means to reserve, from the usages of book for reading.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Clustering Results</head><p>The GMM clustering was accurate but not perfect: 0 errors in the 42 usages of "book", 1 error in the 22 usages of "bank", when manually checked in the excerpt. Hence, learners can choose not to use this, as in the video. Figure <ref type="figure" target="#fig_5">6</ref> shows the variance of the usage vectors of each word against its log frequency in the excerpt. It showed a statistically significant moderate correlation (r = 0.56, p &lt; 0.01 by F-test), implying that frequent words tend to have complex usages.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Motivating Examples</head><p>From the example of "book" in the previous sections, we can easily see that the usages of "book" about reading are more frequent than those of "book" about a reservation. Hence, when counting the number of usages, it is intuitive to assume that learners are not familiar with all usages but are familiar with the usages within a certain radius in the vector space. This is the motivation of our method descried in the next section.</p><p>Before entering the technical details of our visualization method in the next section, we show some usage prediction result examples of our method in a manner similar to the previous examples of "book" so that readers can intuitively understand our motivation, as shown in Figure <ref type="figure" target="#fig_6">7</ref> and Figure <ref type="figure" target="#fig_7">8</ref>. The markers are changed to triangular to denote that the colors reflect prediction results, rather than the GMM-based clustering results explained above. The coloring and darkness of the points in the visualization follow those of the previous examples; the red light-colored point is the probe point, and the other dark points denote usages. Figure <ref type="figure" target="#fig_6">7</ref> shows an example of the familiar usage prediction in case of searching the word "haunt". The right-hand side of the cross-marked circle is the area in which usages are predicted to be familiar to this learner. The probe point is located within the circle. We can see that the usages of "haunt" about chasing are listed below. Figure <ref type="figure" target="#fig_7">8</ref> shows another example of "haunt". As the probe point is located outside of the circle, in the left side of the visualization, the list below shows the list of the usages predicted to be unfamiliar to this learner. We can see that"haunt" about "to curse" are mainly listed. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Proposed Method</head><p>As stated in the Related Work section, some previous studies address methods to predict the words that a learner knows based on his/her short vocabulary test result. However, since our application requires personalized prediction of the usages of the word that the learner does not know. Hence, we propose a novel model that does this.</p><p>Let us write the set of words as {v 1 , . . . , v I }, where I is the number of words (in type), and write the set of learners as {l 1 , . . . , l J }, where J is the number of learners. Then, in previous studies, based on the Rasch model <ref type="bibr" target="#b25">(Rasch 1960;</ref><ref type="bibr" target="#b1">Baker 2004)</ref>, the following logistic regression model Equation 1 is used to predict whether learner l j knows word v i or not. Here, σ(x) := 1 1+exp(−x) and y i,j is the response of the learner in the vocabulary test; y i,j = 1 if learner l j answered correctly to the question of word v i , and y i,j = 0 otherwise. We have two types of parameters to tune: a lj is the ability of learner l j and d vi is the difficulty of word v i .</p><formula xml:id="formula_0">P (y i,j = 1|l j , v i ) = σ(a lj − d vi )<label>(1)</label></formula><p>Here, how to model d vi , or the difficulty parameter of word v i , is the key to our purpose. Previous studies report that the negative logarithm of the word frequency correlates well with the perceived difficulty of words <ref type="bibr" target="#b27">(Tamayo 1987;</ref><ref type="bibr" target="#b2">Beglar 2010</ref>). As in Figure <ref type="figure" target="#fig_0">1</ref>, our key idea is to count the frequency of word usages only within a certain distance from the typical usage of the word. Hence, we propose the following model to implement this idea.</p><p>For each v i , we have n i vectors that are vector representation of each of the n i occurrences of word v i . We write these vectors as X i = { x 1,i , . . . , x ni,i }. Each vector x k,i is T 1 dimensional. Among X i , let c i be the one closest to their center 1 ni ni k=1 x k . Let freq(v i ) be the frequency of the vectors in X i within distance measured from the central vector c i . We write this frequency simply as freq</p><formula xml:id="formula_1">(v i ) = N ( c i , , X i ).</formula><p>Here, n is the number of usages of word v i and let each x k be each usage vector obtained from contextualized word embedding methods. Let ReLU(z) = max(0, z) be the recti-fied linear unit function, and M be a large positive constant, such as 100. Let G be a linear projection matrix from a T 1 dimensional space to a T 2 dimensional space. Let d e ( a, b) be the Euclidean distance between two vectors. By using these formulations, we modeled the difficulty of words as follows:</p><formula xml:id="formula_2">d vi = − log(freq(v i ) + 1) (2) freq(v i ) = N ( c i , , X i ) (3) ≈ ni k=1 tanh (M • ReLU( − d e (G c i , G x k,i )))(4)</formula><p>The tricky part is that Equation 3 can be approximately written as Equation <ref type="formula">4</ref>, whose parameter can be easily tuned and optimized by using neural machine learning framework such as PyTorch. In Equation <ref type="formula">4</ref>, due to the ReLU function, negative values within the function is simply ignored. Hence, as d e is the Euclidean distance, if = 0, i.e., the size of the circle is 0, the terms inside ReLU is negative, and freq(v i ) = 0. If − d e (G c i , G x k,i ) &gt; 0, due to M and tanh, the resulting value is almost 1. This means that we are counting only the cases that surpasses d e , i.e., counting the usages within measured from c i .</p><p>Notably, the following characteristics are important to understand our model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Not merely a logistic regression</head><p>Notably, the proposed model is not merely a logistic regression. Our model has more parameters such as , c i , a lj , G. Because of having different extra parameters compared to the logistic regression, to train our model, we typically need to use a neural network machine learning framework to model and optimize, such as PyTorch. To optimize using such models, as it is difficult to differentiate the loss functions of such models by hand, the model loss function is desirable to be mostly continuous and smooth so that its parameters can be tuned using auto-gradient. We specifically designed Equation 4 to meet these conditions. In the experiments, we used the Adam optimization method <ref type="bibr" target="#b11">(Kingma and Ba 2015)</ref> to optimize the loss function.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Trainable G</head><p>As Equation 4 is mostly continuous and smooth, matrix G can also be trained by using deep-learning framework software. As G is a projection matrix from T 1 to T 2 , if we set T 2 = 2 to consider a projection to a two-dimensional space, training G via supervisions means training visualization via supervisions. Here, in our task setting, the supervisions are vocabulary test dataset of second language learners, i.e., a matrix in which the (j, i)-th element denotes whether learner l j correctly answered the question of word v i .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>j : Personalized</head><p>In Equation <ref type="formula">4</ref>, for easier understanding, we write to be a constant that does not depend on learner index j. In reality, we can personalize by making dependent to learner index j as j ; in this case, each learner l j has his/her own region that he/she can understand, and the radius of this region is j . This personalized version is the one that we used in the experiments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Experiments Quantitative Results of Prediction</head><p>Quantitative evaluation of this personalized prediction of usages of a word is difficult; to this end, we need to test each learner multiple times for different usages of the same word. However, when tested with the same word multiple times, learners easily notice that the word has multiple meanings. Hence, instead, we evaluated the accuracy of personalized prediction of the words that the learner knows under an experiment setting similar to <ref type="bibr" target="#b6">(Ehara 2018)</ref>. Our proposed method is based on neural classification with a novel extension to adjusted counting the frequency of the usages within distance j . Since a typical logistic-regression classifier is identical to one-layer neural classifier, comparing our model with a typical logistic-regression classifier using a frequency feature in terms of accuracy can be used to indirectly evaluate how the idea of adjusted frequency is a practical method for evaluation.</p><p>The proposed model estimates the number of occurrences, i.e., usages, that each learner knows. In other words, this can be regarded as modifying the word frequency so that the model fits to the given vocabulary test dataset. In this regard, we can evaluate how well the proposed model can correct word frequency when an unbalanced corpus is given. Each document in the British National Corpus (BNC) (BNC Consortium 2007) is annotated with a domain Table <ref type="table" target="#tab_0">1</ref>. We evaluated how the proposed model can correct the word frequency in the "arts" domain.</p><p>We used the vocabulary test result data in which each of 100 learners answered 31 vocabulary questions on the publicly available dataset <ref type="bibr" target="#b6">(Ehara 2018)</ref>. In 3, 100 vocabulary test responses, we used 1, 800 to train the model, and the rest was used for the test. The baseline model is simply a logistic regression in which the logarithm of word frequency is the only feature. The logarithm of word frequency has been used as a simple rough measure for word difficulty and previously used to analyze and predict word difficulty based on vocabulary test data <ref type="bibr" target="#b2">(Beglar 2010;</ref><ref type="bibr" target="#b5">Ehara et al. 2013;</ref><ref type="bibr" target="#b14">Lee and Yeung 2018;</ref><ref type="bibr" target="#b30">Yeung and Lee 2018</ref>). The proposed model counts only the number of usages within the radius j . We used the PyTorch neural network framework to automatically tune the radius j and the center of the sphere by using its powerful automatic gradient support <ref type="bibr" target="#b21">(Paszke et al. 2017)</ref>. First, we perform experiments on T 1 = T 2 and G = I, a setting where no projection is performed and the model deals with T 1 dimensional hyperspheres. Table <ref type="table">2</ref> shows the results. It can be seen that the accuracy of the prediction of the word test data of language learners using the biased text of arts domain only is lower than that using the word frequency of all domains. The proposed method was able to improve the accuracy of the word frequency of the arts domain only by counting the frequency in the region on the contextual word expression vector space where the examinee is estimated to be reacting. This effect was also observed for all domains. This seems to be the effect of frequency counting excluding the cases where the proposed method is outlier. The improvement in accuracy before and after correction (p &lt; 0.01, Wilcoxon test) was statistically significant when modifying word frequencies in the arts domain alone or in all domains.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>"Trained" visualization</head><p>In the above experiments, we considered the case where no projection was conducted, by fixing G = I. Next, let us consider the case where G is a projection to a two-dimensional space, i.e., G is a 2 × T 1 matrix. Tuning G and radius j to fit the vocabulary test dataset by using Equation 4 means that we can actually train the visualization to explain the vocabulary test dataset in a supervised manner. Figure <ref type="figure" target="#fig_8">9</ref> and Figure <ref type="figure" target="#fig_9">10</ref> show the result of the visualization. The initial value of G was set to a two-dimensional projection matrix by principal component analysis (PCA). Though the initial value is the projection by PCA, it should be noted that the projection matrix G itself is trained from the vocabulary test dataset as well as the radius j .</p><p>From Figure <ref type="figure" target="#fig_8">9</ref> and Figure <ref type="figure" target="#fig_9">10</ref>, we can see that the proposed method counts only the main meanings within the red circle. To qualitatively evaluate the results, in Table <ref type="table">3</ref>, the two farthest or closes example occurrences of "period" from its center point, i.e., the center of the red circle in Figure <ref type="figure" target="#fig_8">9</ref> are shown. It can be seen that the farthest cases are examples of the use of technical terms such as "period pain" and "magnetic field period", while the closest two cases are examples of nouns representing periods such as "this period" and "the period".</p><p>Table <ref type="table">3</ref>: Farthest (F.) and closest (C.) two occurrences of "period" from the center of the circle in Figure <ref type="figure" target="#fig_8">9</ref>.</p><p>F. period pains can be severe and disruptive. F. to produce a slight spread of magnetic field period . C.</p><p>design during this period was in the plan . C.</p><p>the pub designer of the period ,  <ref type="formula">4</ref>). Intuitively, a large j denotes that learner lj has a high language ability as he/she is estimated to understand many of the occurrences of the word "period" within the red circle. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Conclusions</head><p>In this paper, we propose a supervised visualization method to predict which usages of a word are known to each learner, by using a vocabulary test result dataset as supervisions. Our neural method automatically tunes the projection matrix to visualize and the radius of each learner in the visualization so that the counted frequency within the circles fits to the supervisions. Experiments on actual subject response data show that the proposed method can predict subject response more accurately by modifying the frequency even when the use cases are biased to a specific domain. As a future work, we are planning to make our method more interactive.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Usage of "haunt" predicted to be familiar to the learner.</figDesc><graphic coords="2,319.50,54.00,238.50,225.00" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: An example of concordancer.</figDesc><graphic coords="2,319.50,319.81,238.50,71.85" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: System layout. CWE means contextualized word embeddings.</figDesc><graphic coords="4,83.81,54.00,178.88,129.38" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Example of searching the word book.</figDesc><graphic coords="4,83.81,233.24,178.87,209.95" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Another example of searching the word book.</figDesc><graphic coords="4,349.31,54.00,178.87,183.21" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 6 :</head><label>6</label><figDesc>Figure 6: Variance of usage vectors vs. log word frequency.</figDesc><graphic coords="5,83.81,54.00,178.88,136.46" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>Figure 7 :</head><label>7</label><figDesc>Figure 7: Recap: usage of "haunt" predicted to be known to the learner.</figDesc><graphic coords="5,349.31,54.00,178.88,168.76" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_7"><head>Figure 8 :</head><label>8</label><figDesc>Figure 8: Usage of "haunt" predicted not to be known to the learner.</figDesc><graphic coords="6,83.81,54.00,178.88,168.91" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_8"><head>Figure 9 :</head><label>9</label><figDesc>Figure 9: Trained visualization example of "period". Each triangle point represents an occurrence, or a usage, of the word "period" in the "arts" domain in the BNC corpus. The entire projection of the original contextualized word embedding vectors to the two-dimensional space, namely G, and the radius j was optimized to fit the vocabulary test dataset (ref. Equation 1, Equation3, and Equation4). Intuitively, a large j denotes that learner lj has a high language ability as he/she is estimated to understand many of the occurrences of the word "period" within the red circle.</figDesc><graphic coords="8,54.00,151.11,238.50,169.10" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_9"><head>Figure 10 :</head><label>10</label><figDesc>Figure 10: Trained visualization example of "figure". The setting of the training is identical to that of Figure 9.</figDesc><graphic coords="8,54.00,435.22,238.50,165.49" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>Number of sentences in each domain of the BNC corpus in the total of 100, 000 sentences.</figDesc><table><row><cell cols="2">imaginative</cell><cell>21,946</cell></row><row><cell>arts</cell><cell></cell><cell>18,289</cell></row><row><cell cols="2">natural sciences</cell><cell>5,256</cell></row><row><cell cols="2">social science</cell><cell>7,777</cell></row><row><cell cols="2">commerce</cell><cell>4,378</cell></row><row><cell>leisure</cell><cell></cell><cell>20,300</cell></row><row><cell cols="2">belief and thought</cell><cell>3,441</cell></row><row><cell cols="2">world news</cell><cell>764</cell></row><row><cell cols="2">applied science</cell><cell>2,625</cell></row><row><cell cols="2">world affairs</cell><cell>15,224</cell></row><row><cell cols="3">Table 2: Accuracy of predicting learners' vocabulary test re-</cell></row><row><cell cols="3">sponses by using the raw freqiencies and the corrected fre-</cell></row><row><cell cols="3">quencies by the proposed model in each domain.</cell></row><row><cell>Domain</cell><cell cols="2">Correction Accuracy</cell></row><row><cell>Arts</cell><cell>Raw</cell><cell>0.61</cell></row><row><cell>Arts</cell><cell>Corrected</cell><cell>0.64</cell></row><row><cell>All domains</cell><cell>Raw</cell><cell>0.67</cell></row><row><cell>All domains</cell><cell>Corrected</cell><cell>0.72</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://lextutor.ca/conc/eng/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_1">https://mlexplained.com/2018/09/14/paper-dissectedvisualizing-data-using-t-sne-explained/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_2">4 https://distill.pub/2016/misread-tsne/</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Probabilistic FastText for multi-sense word embeddings</title>
		<author>
			<persName><forename type="first">B</forename><surname>Athiwaratkun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Wilson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Anandkumar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of ACL</title>
				<meeting>of ACL</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">Item Response Theory : Parameter Estimation Techniques</title>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">B</forename><surname>Baker</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2004">2004</date>
			<publisher>CRC Press</publisher>
		</imprint>
	</monogr>
	<note>Second Edition</note>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">A rasch-based validation of the vocabulary size test</title>
		<author>
			<persName><forename type="first">D</forename><surname>Beglar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The British National Corpus, version 3</title>
				<editor>
			<persName><forename type="first">T</forename></persName>
		</editor>
		<imprint>
			<publisher>BNC Consortium</publisher>
			<date type="published" when="2007">2010. 2007</date>
			<biblScope unit="volume">27</biblScope>
			<biblScope unit="page" from="101" to="118" />
		</imprint>
	</monogr>
	<note>BNC XML Edition</note>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Bert: Pre-training of deep bidirectional transformers for language understanding</title>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of NAACL</title>
				<meeting>of NAACL</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Relieving polysemy problem for synonymy detection</title>
		<author>
			<persName><forename type="first">G</forename><surname>Dias</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Moraliyski</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Portuguese Conference on Artificial Intelligence</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="610" to="621" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Mining words in the minds of second language learners: learnerspecific word difficulty</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Ehara</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Oiwa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Nakagawa</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of COLING</title>
				<meeting>of COLING</meeting>
		<imprint>
			<date type="published" when="2012">2012. 2013</date>
			<biblScope unit="volume">4</biblScope>
		</imprint>
	</monogr>
	<note>Personalized reading support for second-language web documents</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Building an english vocabulary knowledge dataset of japanese english-as-a-second-language learners using crowdsourcing</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Ehara</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. LREC</title>
				<meeting>LREC</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">An algorithm for the principal component analysis of large data sets</title>
		<author>
			<persName><forename type="first">N</forename><surname>Halko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P.-G</forename><surname>Martinsson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Shkolnisky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Tygert</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">SIAM Journal on Scientific computing</title>
		<imprint>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="issue">5</biblScope>
			<biblScope unit="page" from="2580" to="2594" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Combining Lexical and Grammatical Features to Improve Readability Measures for First and Second Language Texts</title>
		<author>
			<persName><forename type="first">M</forename><surname>Heilman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Collins-Thompson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Callan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Eskenazi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of NAACL</title>
				<meeting>of NAACL<address><addrLine>Rochester, New York</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="460" to="467" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Virtual Bartender: A Dialog System Combining Data-Driven and Knowledge-Based Recommendation</title>
		<author>
			<persName><forename type="first">K</forename><surname>Hinkelmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Blaser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Faust</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Horst</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Mehli</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">AAAI Spring Symposium: Combining Machine Learning with Knowledge Engineering</title>
				<imprint>
			<date type="published" when="1987">2019. 1987</date>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="125" to="131" />
		</imprint>
	</monogr>
	<note>The Oxford Concordance Program Version 2</note>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">TANGO: Bilingual collocational concordancer</title>
		<author>
			<persName><forename type="first">J.-Y</forename><surname>Jian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y.-C</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">S</forename><surname>Chang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of ACL demo</title>
				<meeting>of ACL demo</meeting>
		<imprint>
			<date type="published" when="2004">2004</date>
			<biblScope unit="page" from="166" to="169" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Adam: A method for stochastic optimization</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">P</forename><surname>Kingma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ba</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of ICLR</title>
				<meeting>of ICLR</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Understanding Blackbox Predictions via Influence Functions</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">W</forename><surname>Koh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Liang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of ICML</title>
				<meeting>of ICML</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="1885" to="1894" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Towards an Assistive and Pattern Learning-driven Process Modeling Approach</title>
		<author>
			<persName><forename type="first">E</forename><surname>Laurenzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Hinkelmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Jüngling</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Montecchiari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Pande</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Martin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">AAAI Spring Symposium: Combining Machine Learning with Knowledge Engineering</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Personalizing lexical simplification</title>
		<author>
			<persName><forename type="first">J</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">Y</forename><surname>Yeung</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of COLING</title>
				<meeting>of COLING</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="224" to="232" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Visual exploration of semantic relationships in neural word embeddings</title>
		<author>
			<persName><forename type="first">S</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P.-T</forename><surname>Bremer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">J</forename><surname>Thiagarajan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Srikumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Livnat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Pascucci</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE trans. on vis. and comp. g</title>
		<imprint>
			<biblScope unit="volume">24</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="553" to="562" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">A Unified Approach to Interpreting Model Predictions</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Lundberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S.-I</forename><surname>Lee</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of NIPS</title>
				<editor>
			<persName><forename type="first">I</forename><surname>Guyon</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">U</forename><forename type="middle">V</forename><surname>Luxburg</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Bengio</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Wallach</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Fergus</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Vishwanathan</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Garnett</surname></persName>
		</editor>
		<meeting>of NIPS</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="4765" to="4774" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">FastKwic, an &quot;intelligent&quot; concordancer using FASTR</title>
		<author>
			<persName><forename type="first">V</forename><surname>Lux-Pogodalla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Besagni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Fort</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of LREC</title>
				<meeting>of LREC</meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Visualizing data using t-sne</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">V D</forename><surname>Maaten</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Hinton</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of machine learning research</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="page" from="2579" to="2605" />
			<date type="published" when="2008-11">2008. Nov</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Distributed representations of words and phrases and their compositionality</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">S</forename><surname>Corrado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dean</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of NIPS</title>
				<meeting>of NIPS</meeting>
		<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="3111" to="3119" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Benchmarking lexical simplification systems</title>
		<author>
			<persName><forename type="first">G</forename><surname>Paetzold</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Specia</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">LREC</title>
				<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<title level="m" type="main">Automatic differentiation in pytorch</title>
		<author>
			<persName><forename type="first">A</forename><surname>Paszke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gross</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chintala</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Chanan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Devito</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Desmaison</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Antiga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Lerer</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Glove: Global vectors for word representation</title>
		<author>
			<persName><forename type="first">J</forename><surname>Pennington</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Socher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Manning</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of EMNLP</title>
				<meeting>of EMNLP</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="1532" to="1543" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Deep contextualized word representations</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">E</forename><surname>Peters</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Neumann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Iyyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gardner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Clark</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of NAACL</title>
				<meeting>of NAACL</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">CoKE: Word Sense Induction Using Contextualized Knowledge Embeddings</title>
		<author>
			<persName><forename type="first">S</forename><surname>Ramprasad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Maddox</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">AAAI Spring Symposium: Combining Machine Learning with Knowledge Engineering</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<monogr>
		<title level="m" type="main">Probabilistic Models for Some Intelligence and Attainment Tests</title>
		<author>
			<persName><forename type="first">G</forename><surname>Rasch</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1960">1960</date>
			<pubPlace>Copenhagen</pubPlace>
		</imprint>
		<respStmt>
			<orgName>Danish Institute for Educational Research</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Embedding Projector: Interactive Visualization and Interpretation of Embeddings</title>
		<author>
			<persName><forename type="first">J</forename><surname>Read</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">T</forename><surname>Ribeiro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Guestrin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems</title>
				<meeting>of NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems<address><addrLine>San Francisco, California</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2000">2000. 2016. 2018. 2016</date>
		</imprint>
	</monogr>
	<note>Proc. of KDD, 1135-1144</note>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Frequency of use as a measure of word difficulty in bilingual vocabulary test construction and translation</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Tamayo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Educational and Psychological Measurement</title>
		<imprint>
			<biblScope unit="volume">47</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="893" to="902" />
			<date type="published" when="1987">1987</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Visualizing large-scale and high-dimensional data</title>
		<author>
			<persName><forename type="first">J</forename><surname>Tang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Mei</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of WWW</title>
				<meeting>of WWW</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="287" to="297" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">J.-C</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">C</forename><surname>Chuang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W.-C</forename><surname>Shei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">S</forename><surname>Chang</surname></persName>
		</author>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Subsentential translation memory for computer assisted writing and translation</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">Y</forename><surname>Yeung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lee</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of ACL demo</title>
				<meeting>of ACL demo</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="3448" to="3455" />
		</imprint>
	</monogr>
	<note>Proc. of COLING</note>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">A report on the complex word identification shared task</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Yimam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Biemann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Malmasi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">H</forename><surname>Paetzold</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Specia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Štajner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Tack</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zampieri</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of BEA</title>
				<meeting>of BEA</meeting>
		<imprint>
			<date type="published" when="2018">2018. 2018</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
