=Paper= {{Paper |id=Vol-1419/paper0041 |storemode=property |title=Multilingual Distributional Semantic Models: Toward a Computational Model of the Bilingual Mental Lexicon |pdfUrl=https://ceur-ws.org/Vol-1419/paper0041.pdf |volume=Vol-1419 |dblpUrl=https://dblp.org/rec/conf/eapcogsci/Utsumi15 }} ==Multilingual Distributional Semantic Models: Toward a Computational Model of the Bilingual Mental Lexicon== https://ceur-ws.org/Vol-1419/paper0041.pdf

Multilingual Distributional Semantic Models:
Toward a Computational Model of the Bilingual Mental Lexicon
Akira Utsumi (utsumi@uec.ac.jp)
Department of Informatics, The University of Electro-Communications
1-5-1, Chofugaoka, Chofushi, Tokyo 182-8585, Japan

Abstract the method for monolingual DSMs can be directly applied
for multilingual DSMs. Against this advantage, however,
In this paper, we propose a novel framework of a multilin- the parallel-corpus-based approach to multilingual DSMs has
gual distributional semantic model to provide a psychologi-
cally plausible computational model of the bilingual mental some drawbacks. One serious drawback is that the use of par-
lexicon. In the proposed framework, a monolingual semantic allel corpora is not psychologically plausible. It is extremely
space for each target language is first generated from the cor- rare for bilinguals to be exposed to the same message in both
responding monolingual corpus. These monolingual semantic languages simultaneously. Bilingual children often use dif-
spaces are then converted into ones with common dimensions,
which are in turn integrated into a single multilingual seman- ferent languages according to with whom to communicate
tic space. The language of dimensions, which we refer to as (i.e., parents or friends) and where to communicate (i.e., at
a pivot language, determines the type of bilinguals simulated home or outside). It follows that bilingual lexical develop-
by the model. We also tested the psychological plausibility
of the proposed multilingual distributional semantic model by ment and lexical knowledge is very unlikely to be explained
comparing the cosine similarity computed by the model with by the distributional statistics obtained from a parallel cor-
the cross-language word similarity ratings of L1 Japanese/L2 pus. The other drawback of the parallel-corpus-based ap-
English sequential bilinguals. The result was that the bilingual proach is a practical one; parallel corpora are generally less
semantic space with Japanese as a pivot language, which is
predicted to be a model for L1 Japanese/L2 English sequential easily available and of smaller size than monolingual corpora.
bilinguals, achieved better performance in simulating the simi- Therefore, the multilingual semantic spaces generated from a
larity rating data. This suggests the plausibility of the proposed parallel corpus cannot be expected to achieve a satisfactory
multilingual model. performance.
Keywords: Multilingual distributional semantic model; Bilin- In this paper, therefore, we propose a novel method for
gual mental lexicon; Cross-language semantic similarity constructing multilingual DSMs toward a cognitive model of
the bilingual mental lexicon. To overcome the drawbacks
Introduction of the parallel-corpus-based approach, our method does not
Distributional semantic models (henceforth, DSMs), or se- use any parallel corpora; it generates a monolingual semantic
mantic space models, are models for semantic representations space for each target language using a monolingual corpus,
of words and for the way where semantic representations are and then integrates the multiple monolingual spaces into a
constructed (Turney & Pantel, 2010). The semantic content single multilingual semantic space. The integration is car-
is represented by a high-dimensional vector, and these vec- ried out by aligning context words in different languages by
tors are constructed from a corpus by observing distributional direct correspondences between words (i.e., lexical links) or
statistics of word occurrence. Despite their simplicity, DSMs via a conceptual representation (i.e., conceptual links). This
have provided a useful framework for cognitive modeling, es- distinction is motivated by the psychological model of the
pecially for human semantic knowledge (e.g., Jones, Kintsch, bilingual mental lexicon (Kroll & Stewart, 1994). We then
& Mewhort, 2006; Landauer & Dumais, 1997). test the psychological plausibility of the proposed method for
However, these studies have explored the mental lexicon of multilingual DSMs using the cross-language word similar-
a monolingual speaker and all DSMs used in these studies are ity ratings of Japanese-English bilinguals (Allen & Conklin,
monolingual. Given a recent growing interest in bilingual- 2014). Finally, we discuss the potential ability of the pro-
ism in cognition (e.g., Bialystok, Craik, & Luk, 2012), it is posed method to simulate a variety of findings on bilingual
quite reasonable to consider a multilingual extension of DSM lexicon and to provide a tool for bilingual research.
toward a cognitive model of the bilingual (or multilingual)
mental lexicon. This is what this study aims to accomplish. Computational Model
In the field of natural language processing or computa- In this section, we first review a general method for construct-
tional linguistics, some studies (e.g., Bader & Chew, 2008; ing a monolingual semantic space. We then propose a novel
Wei, Yang, & Lin, 2008; Widdows, 2004) have proposed a method for constructing a multilingual semantic space, by
multilingual extension of latent semantic analysis (LSA) for which monolingual semantic spaces for target languages are
multilingual document clustering and cross-language word integrated into a single multilingual semantic space.
similarity computation. What these methods have in com- Monolingual DSM
mon is the use of a parallel corpus. A parallel corpus is a
collection of bilingual (or multilingual) texts comprising sen- The method for constructing semantic spaces generally com-
tences (or documents) in one language and their translations prises the following three steps:
in other languages. By regarding aligned texts (i.e., a pair 1. Initial matrix construction: n content words in a given cor-
of an original text and its translations) as a single document, pus are represented as m-dimensional initial vectors whose

270
police bark, roar, … mew, bark, …
mouse, rat, …
Step 1 ㆙ᐹ ྭ䛘䜛 㬆䛟 䝛䝈䝭 Step 2 police bark mew mouse roar
≟ ⋯ 1 5 2 0 ⋯ ≟ ⋯ 1 7 2 0 ⋯ 5 ⋯
⊧ ⋯ 0 0 2 3 ⋯ ⊧ ⋯ 0 2 2 3 ⋯ 0 ⋯
⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯
Japanese corpus

Step 3
police bark mew mouse roar
≟ ⋯ 1 7 2 0 ⋯ 5 ⋯
Step 1 police bark mew mouse ⊧ ⋯ 0 2 2 3 ⋯ 0 ⋯
dog ⋯ 4 2 0 0 ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯
cat ⋯ 0 0 5 3 ⋯ dog ⋯ 4 2 0 0 ⋯ 0 ⋯
⋯ ⋯ ⋯ ⋯ ⋯ ⋯ cat ⋯ 0 0 5 3 ⋯ 0 ⋯
English corpus ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯

(a) English as a pivot language

Step 1 ㆙ᐹ ྭ䛘䜛 ㏣䛖 䝛䝈䝭 Step 2
≟ ⋯ 1 5 0 0 ⋯ ≟ ⋯ 1 5 0 0 ⋯ 0 ⋯
⊧ ⋯ 0 0 4 3 ⋯ ⊧ ⋯ 0 0 4 3 ⋯ 3 ⋯
⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯
Japanese corpus

≟ ⋯ 1 5 0 0 ⋯ 0 ⋯
⊧ ⋯ 0 0 4 3 ⋯ 3 ⋯
Step 3 ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯
dog ⋯ 4 2 0 0 ⋯ 0 ⋯
cat ⋯ 0 0 6 3 ⋯ 0 ⋯
⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯

Step 1 police bark chase mouse Step 2
dog ⋯ 4 2 0 0 ⋯ dog ⋯ 4 2 0 0 ⋯ 0 ⋯
cat ⋯ 0 0 6 3 ⋯ cat ⋯ 0 0 6 3 ⋯ 0 ⋯
⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯
English corpus

(b) Concepts as a pivot language

Figure 1: A rough sketch of the multilingual distributional semantic model proposed in this paper: The case of English-Japanese
bilingual semantic space.

elements are frequencies in a linguistic context. As a re- “documents-as-contexts” method, an element a i j is deter-
sult, an n by m matrix A = (a i j ) is constructed using n word mined as the frequency of a word w i in a document d j (i.e.,
vectors as rows. the number of times a word w i occurs in a document d j ). On
the other hand, in a “words-as-contexts” method, a i j is calcu-
2. Weighting: The elements of the matrix A are weighted. lated as the cooccurrence frequency of a target word w i and a
3. Smoothing: The dimension of the row vectors of A is re- context word w j within a certain context (i.e., the number of
duced from the initial dimension m to r. times two words wi and w j cooccur in a context). As a context
for counting cooccurrence, we use a “window” spanning two
As a result, an r-dimensional semantic space including n words on either side of the target word. Note that the existing
words is generated. methods for multilingual LSA often employ a documents-as-
For initial matrix construction in Step 1, two popular meth- contexts matrix by regarding aligned texts in a parallel corpus
ods are used for computing the elements a i j of A. In a as a single document. On the other hand, our method does not

271
use a parallel corpus, and thus we apply a words-as-contexts that the cooccurrence frequency of 犬 and police is counted as
method to initial matrix construction, as we will explain in 1. If a context word has more than one equivalent in the pivot
the next subsection. language, the pseudo-cooccurrence frequencies for the other
For Step 2, various weighting methods have been proposed. equivalents are also counted in the same way. For example,
Two popular methods are entropy-based tf-idf weighting because the context word 吠える has at least two equivalents
(Landauer & Dumais, 1997) and PPMI (positive pointwise bark and roar, the cooccurrence frequency between 犬 and
mutual information) weighting (Bullinaria & Levy, 2007; roar as well as between 犬 and bark is counted as 2. Note
Recchia & Jones, 2009). In this paper, we use a PPMI weight- that the context word bark is also the equivalent of the con-
ing method because it is suitable for words-as-contexts ma- text word 鳴く and thus the pseudo-cooccurrence frequency
trices and generally achieves good performance (Bullinaria between 犬 and bark is 7 (= 5 from 吠える + 2 from 鳴く).
& Levy, 2007; Recchia & Jones, 2009). PPMI is based on Finally, in the last step (i.e., Step 3), the converted matrix for
the pointwise mutual information (PMI) and replaces nega- Japanese and the original cooccurrence matrix for English in
tive PMI values with zero. The last step (Step 3) of smooth- Figure 1 (a) (or the converted English matrix in Figure 1 (b))
ing is optional and usually conducted using singular value de- are concatenated into a single matrix expressing an English-
composition (SVD). In this paper, we do not smooth the ma- Japanese bilingual semantic space.
trix because PPMI semantic spaces generally achieves good In this framework of multilingual DSM, the pivot language
performance even though smoothing is not applied (Recchia determines the type of bilinguals for which the constructed
& Jones, 2009). semantic space is suitable. Because all the words in all tar-
get languages are represented through a pivot language, the
Multilingual DSM pivot language can be regarded as the dominant language of
The basic idea underlying our multilingual DSM method bilinguals (or multilinguals). Hence, the multilingual DSM
is that word cooccurrence (i.e., words-as-contexts) matrices generated by this method can be regarded as a model of the
generated for each target language using a monolingual cor- mental lexicon of sequential bilinguals with a pivot language
pus are converted into cooccurrence matrices with the same as L1. For example, the multilingual DSM with English as a
set of context words; as a result, word vectors in different lan- pivot language shown in Figure 1 (a) is expected to be a cog-
guages are placed in the single semantic space with common nitive model for L1 English/L2 Japanese bilinguals. When
dimensions. The language of context words can be selected concepts are used as a pivot language as in the case of Fig-
from target languages or other “language” representing con- ure 1 (b), the resulting DSM is assumed to be a model for
cepts. We refer to this language as a pivot language. simultaneous bilinguals, who are exposed to bilingual input
Figure 1 illustrates our idea of the multilingual DSM from birth. This assumption is reasonable because simultane-
method in the case of a Japanese-English bilingual DSM. The ous bilinguals do not have a dominant language and lexical
first step is to generate a word cooccurrence matrix for each development in two languages proceeds indifferently through
target language using a monolingual corpus. In Figure 1 (a), concepts.
two cooccurrence matrices, one for Japanese and the other In the explanation given above, we use a Japanese-English
for English, are constructed separately. For example, the bilingual DSM as an example, but our method is not spe-
Japanese cooccurrence matrix expresses that the target word cific to bilingualism. Formally, given k target languages
犬 cooccurs with the context word 警察 once and with the L1 , L2 , · · · , Lk , the method for constructing a multilingual se-
context word 吠える five times. mantic space on the basis of our idea can be described in the
In the second step, a monolingual cooccurrence matrix following steps:
for a target language is converted into a matrix express-
ing a pseudo-cooccurrence between target words in the tar- 1. The cooccurrence matrices A 11 , A22 , · · · Akk for target lan-
get language and context words in the pivot language. As guages L1 , L2 , · · · , Lk are constructed from the correspond-
shown in Figure 1 (a), when English is a pivot language, the ing monolingual corpora by the method for constructing
Japanese cooccurrence matrix must be converted into the ma- monolingual DSMs.
trix with English context words, while the English cooccur-
2. Using conversion matrices D ip (1 ≤ i ≤ k) from a target
rence matrix does not need to be converted. Conversely, when
language Li into a pivot language L p , the cooccurrence ma-
Japanese is a pivot language, an English cooccurrence ma-
trices Aii generated above are converted into the matrices
trix is converted but a Japanese matrix is not. Furthermore,
Aip with the same dimensions.
when a set of concepts is used as a pivot language as shown
in Figure 1 (b), both cooccurrence matrices are converted into Aip = Aii × Dip (1)
pseudo-cooccurrence matrices with concepts as contexts.
The matrix conversion in Step 2 can be done by translating Note that, if Li is a pivot language L p , then Aip = Aii .
context words into the pivot language using a dictionary or
other lexical database, and by counting the “pseudo” cooc- 3. All the converted matrices A 1p , A2p , · · · Akp are concate-
currence frequency between a word in the target language nated into a single matrix A.
and a context word in the pivot language. For example, as ⎛ ⎞
shown in Figure 1 (a), the context word 警察 has one trans- A1p
lation equivalent police in English, and the cooccurrence fre- A = ⎝ ... ⎠ (2)
quency between the target word 犬 and 警察 is 1. It follows Akp

272
The resulting matrix A represents a multilingual semantic and target languages. In this experiment, we used as a mono-
space. lingual corpus Japanese newspaper articles (i.e., six years’
worth of Mainichi newspaper articles) with 41.2M word to-
At Step 2, we use a conversion (or term alignment) matrix kens, and the written and non-fiction parts of the British Na-
Dip to translate context words into a pivot language. The (s,t) tional Corpus with 54.7M word tokens. In order to deter-
entry of the matrix D ip is 1 (or other nonzero value) if a word mine the vocabulary of the semantic space, we performed the
wt in the pivot language L p is a translation of a word w s in the widely used preprocessing steps, namely stopword removal
language Li and otherwise 0. and lemmatization. Concerning a dictionary, English Word-
Weighting (i.e., Step 2 of the monolingual DSM presented Net 3.0 and Japanese WordNet 1.1 were used. WordNet is
in the last section) can be applied either after Step 2 of the not a dictionary, but it can serve as a dictionary by connecting
above algorithm or after Step 3 of the algorithm. Weight- words in different languages via synsets. WordNet synsets are
ing after Step 2 implies that the converted matrices A ip sets of cognitive synonyms, each expressing a distinct con-
are weighted before they are concatenated into A, while cept. Synsets provide an additional merit in using WordNet
weighting after Step 3 implies that the concatenated matrix in that synsets can be used as a pivot language representing
A is weighted. Note that some weighting methods such concepts (or more precisely a pivot concept). Japanese and
as entropy-based one give the same matrix A regardless of English words to be included for bilingual semantic spaces
whether weighting is applied before or after matrix concate- were selected so that they can be translated into each other
nation, because in those methods word vectors (i.e., row vec- via WordNet. In other words, each of these Japanese words
tors of Aip ) are weighted independently of each other. In the shares at least one synset with at least one of these English
case of PPMI weighting, however, different matrices A are words. As a result, 22,416 Japanese words and 18,463 En-
generated according to the timing of weighting. glish words were selected as the vocabulary of the bilingual
semantic space. These Japanese and English words are re-
Evaluation Experiment lated via 23,421 synsets. Therefore, the size of the monolin-
Test Data gual cooccurrence matrix in Step 1 of the proposed algorithm
was 22,416 × 22,416 for Japanese and 18,463 × 18,463 for
As test data for evaluating multilingual DSMs, we used the English.
cross-linguistic similarity norms for Japanese-English trans-
lations provided by Allen and Conklin (2014). This data com- Method
prises semantic similarity and phonological similarity ratings First of all, using the corpora and WordNet mentioned above,
of 193 Japanese-English word pairs and other relevant mea- we constructed six bilingual semantic spaces from all com-
sures. Among these ratings, we used the semantic similarity binations of three pivot languages (Japanese, English, and
rating on a 5-point scale ranging from 1 to 5, and compared synset) and two timing of weighting (before or after matrix
it with the cosine similarity computed by multilingual DSMs concatenation).
to evaluate their modeling performance. Using these six semantic spaces, we computed the co-
The 193 word pairs are divided into 98 cognates and 95 sine similarity of each pair of words in the test data. Note
noncognates. Cognates are words in different languages that that 12 out of 193 word pairs of Allen and Conklin’s (2014)
share both form and meaning. For example, the Japanese data did not exist in the bilingual semantic spaces, and thus
word カメラ /kamera/ and the English word “camera” are the remaining 181 word pairs (including 89 cognates and 92
cognates. The cognates used in Allen and Conklin’s (2014) noncognates) were used for similarity computation.
study are all loanwords in Japanese, words borrowed from The performance of each bilingual semantic space was
English and written in a separate script, katakana. Noncog- measured by Spearman’s correlation coefficient between the
nates (e.g., 希望 and “hope”) have the same meaning but do computed cosine values and the semantic similarity ratings in
not share form. Cognates have been central to psycholinguis- the test data.
tic research on bilingual language processing because they
provide an effective way in examining an essential question Prediction
of whether bilinguals selectively activate a single language or If a bilingual semantic space is a plausible model of the L1
simultaneously both languages (Dijkstra, 2007). Japanese/L2 English bilingual’s mental lexicon, the correla-
Allen and Conklin’s (2014) semantic similarity norm was tion coefficient is expected to take a high positive value. Fur-
collected from the native speakers of Japanese who also speak thermore, it is predicted that the semantic space with Japanese
English as a second language, namely L1 Japanese/L2 En- as a pivot language shows a higher correlation than that with
glish speakers. Hence, this semantic similarity data can be an English pivot.
regarded as reflecting the mental lexicon of sequential bilin-
guals whose L1 is Japanese and L2 is English. Result
Table 1 shows the correlation coefficients between the cosine
Materials for Multilingual DSM similarity computed by the bilingual semantic spaces and the
As we explained before, the multilingual DSM proposed in semantic similarity ratings of the test data. First of all, the
this paper requires two kinds of language resources, namely a correlation coefficients for all pairs were moderately high
monolingual corpus for each target language and a dictionary and statistically significant. This indicates that the proposed
(or lexical database) for converting between a pivot language multilingual DSM framework provides a plausible model of

273
Table 1: Correlation coefficients between the cosine similar- Median rank *** ***
ity computed by the bilingual semantic spaces and the seman- 104 L2 → L1 J→E
tic similarity ratings by Allen and Conklin (2014). L2 → L1 E→J
103
All pairs Cognates Noncognates
Pivot language (n = 181) (n = 89) (n = 92) 102 L1 → L2
L1 → L2
Weighting BEFORE concatenation
Japanese .294∗∗∗ .284∗∗ .316∗∗ 10
∗∗∗
***p < .001
English .247 .221∗ .282∗∗ 1
Synset .291∗∗∗ .290∗∗ .304∗∗ Japanese pivot English pivot
Weighting AFTER concatenation
Figure 2: Median ranks of the target words in the ordering of
Japanese .342∗∗∗ .329∗∗ .368∗∗∗ cosine similarity to the prime words for the 181 word pairs
English .328 ∗∗∗ .311∗∗ .363∗∗∗ used in the evaluation experiment. J → E (E → J) denotes
Synset .377 ∗∗∗ .395∗∗∗ .371∗∗∗ that Japanese (English) words in the pairs are used as primes
*p < .05. **p < .01. ***p < .001 and their paired English (Japanese) words are targets. Sim-
ilarly, L1 → L2 (L2 → L1) denotes L1 (L2) primes and L2
(L1) targets, assuming that the pivot language of the multilin-
the bilingual mental lexicon. In addition, the semantic space gual DSM plays a role of L1. All the semantic spaces used
with Japanese as a pivot language achieved higher correla- here are weighted before matrix concatenation.
tions than those of the semantic space with an English pivot,
regardless of whether they were calculated for all pairs or
cognates/noncognates. This result is consistent with the pre- Jacquet, 2004).
diction mentioned earlier, and thus suggests that the pivot
language of the multilingual DSM can correctly model the Discussion
dominant language of sequential bilinguals. The correlation In this paper, we have proposed a novel method for construct-
coefficients for the DSM with synsets as a pivot language, ing multilingual DSMs to provide a psychologically plausible
which is expected to model a mental lexicon of simultaneous computational model of the bilingual (or multilingual) men-
bilinguals, did not differ from (in the case of weighting be- tal lexicon. Its plausibility is tested and justified by compar-
fore concatenation) or were slightly higher than (in the case ing the cosine similarity computed by the multilingual DSMs
of weighting after concatenation) those of the DSM with a with the semantic similarity data collected from Japanese-
Japanese pivot. We do not have a reasonable explanation English bilinguals. In particular, the proposed method can
of this result at the moment, but this result may reflect the provide a model that can discriminate between sequential
fact that, as bilinguals become more proficient in L2, their L2 bilinguals with different L1. Indeed, the evaluation experi-
lexical knowledge is learned via a conceptual representation ment demonstrated that it can generate a semantic space ap-
(Kroll & Stewart, 1994). propriate for L1 Japanese/L2 English sequential bilinguals.
Comparison of the results between cognate and noncognate However, the experiment presented in this paper is not so
pairs shows that our proposed multilingual DSMs were more comprehensive and rather preliminary. Further justification
advantageous to noncognates. One possible reason would be of the modeling performance of the multilingual DSM must
due to word frequency effect; Japanese cognates are generally await further research, but in this section we discuss the po-
less frequent than noncognates (Allen & Conklin, 2014), and tential ability of the multilingual DSM to explain other psy-
thus the cooccurrence statistics for cognates is less sufficient cholinguistic findings on bilingual lexical processing.
for plausible vector representations. Research on bilingual lexical processing has demonstrated
For the stage at which weighting is applied to a cooccur- that lexical access in bilinguals is language nonselective (van
rence matrix, weighting after concatenation achieved better Heuven & Dijkstra, 2010; Schwartz & Kroll, 2006). In other
performance than weighting before concatenation. This result words, lexical representations in both languages are activated
is not surprising because PPMI weighting requires to estimate in parallel regardless of which language is being processed.
the probability of context words across all target words, but This is evidenced by the cross-language priming paradigm
weighting before concatenation computes the probability of in which a prime word in one language facilitates a target
context words separately for each language. However, it is an word in another language. Particularly interesting is the well-
open question whether weighting before or after concatena- known finding that primes in L1 obviously facilitate targets
tion is plausible as a model of the bilingual mental lexicon. in L2, but L2 primes do not reliably facilitate L1 targets (e.g.,
Although the proposed algorithm for constructing multilin- Jiang & Forster, 2001; Schwartz & Kroll, 2006). This asym-
gual DSMs is not a psychological process model, weighting metry effect may be able to be explained by the multilingual
after concatenation may lend support to the view of a sin- DSM proposed in this paper. One reasonable way to do this is
gle integrated bilingual lexicon, rather than the view of two to employ the rank of the target word under the ordering im-
separate lexicons (for a review of two views, see French & posed by the cosine similarity to the prime word as a measure

274
for the degree of its priming effects (e.g., Griffiths, Steyvers, ference on computational linguistics (COLING-2008)
& Tenenbaum, 2007). The rationale behind this assumption (pp. 49–56).
is that the target word which ranks higher by the cosine sim- Bialystok, E. (2009). Bilingualism: The good, the bad, and
ilarity to the prime word is more activated and accessible by the indifferent. Bilingualism: Language and Cogni-
the prime word. For example, if among all the words in the tion, 12, 3–11.
semantic space the target word ranks first by the cosine simi- Bialystok, E., Craik, F. I., & Luk, G. (2012). Bilingualism:
larity to the prime word, it seems to suggest that the target is Consequences for mind and brain. Trends in Cognitive
activated most preferably by the prime. On the other hand, if Science, 16, 240–250.
the target word ranks very low even though the cosine value
Bullinaria, J. A., & Levy, J. P. (2007). Extracting seman-
does not differ from the above case, it is less likely to be ac-
tic representations from word co-occurrence statistics:
tivated by the prime. Figure 2 shows the median rank of the
target words obtained by applying this methodology to the A computational study. Behavior Research Methods,
181 word pairs used in the evaluation experiment. The re- 39(3), 510–526.
sult is consistent with the asymmetry effect of cross-language Dijkstra, T. (2007). The multilingual lexicon. In M. Gaskell
priming. The Wilcoxin signed-rank test indicated that the me- (Ed.), The Oxford handbook of psycholinguistics (pp.
dian rank in the case of L1 prime and L2 target (i.e., J → E for 251–265). Cambridge: Oxford University Press.
the DSM with the Japanese pivot and E → J for the DSM with French, R. M., & Jacquet, M. (2004). Understanding bilin-
the English pivot) is significantly higher than that of L2 prime gual memory: Models and data. Trends in Cognitive
and L1 target. Science, 8, 87–93.
Another well-known finding on multilingual lexical pro- Griffiths, T., Steyvers, M., & Tenenbaum, J. (2007). Topics
cessing is that bilinguals generally perform more poorly on in semantic representation. Psychological Review, 114,
lexical tasks in both languages than monolinguals (Bialystok, 211–244.
2009; Bialystok et al., 2012). This disadvantage of bilinguals Jiang, N., & Forster, K. I. (2001). Cross-language priming
is considered to be due to the interference from the other lan- asymmetries in lexical decision and episodic recogni-
guage. This interference effect can also be possibly explained tion. Journal of Memory and Language, 44, 32–51.
by comparing the median rank of word pairs in the same lan- Jones, M. N., Kintsch, W., & Mewhort, D. J. (2006). High-
guage between the multilingual and monolingual DSMs. For dimensional semantic space accounts of priming. Jour-
example, we computed the median rank over 163 Japanese
nal of Memory and Language, 55, 534–552.
word association pairs (chosen from the Japanese word as-
Kroll, J. F., & Stewart, E. (1994). Category interference in
sociation norm “Renso Kijunhyo”) by means of multilingual
and monolingual DSMs. The result is that, as predicted, the translation and picture naming: Evidence for asymmet-
median rank of the monolingual DSM (38.0) is higher than ric connections between bilingual memory representa-
those of the multilingual DSMs (46.0 for the English pivot, tions. Journal of Memory and Language, 33, 149–174.
p < .001; 56.0 for the Japanese pivot, p < .01). Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s
From the above discussion, it is clear that the multilingual problem: The latent semantic analysis theory of the ac-
DSM proposed in this paper may have the potential to sim- quisition, induction, and representation of knowledge.
ulate several empirical findings on bilingual lexical process- Psychological Review, 104, 211–240.
ing. In addition, the proposed DSM framework may be able Recchia, G., & Jones, M. N. (2009). More data trumps
to simulate the behaviors of a variety of bilinguals with dif- smarter algorithms: Comparing pointwise mutual in-
ferent degrees of language proficiency and with different de- formation with latent semantic analysis. Behavior Re-
velopmental patterns. This may be realized, for example, by search Methods, 41, 647–656.
controlling context words (e.g., reducing context words into Schwartz, A. I., & Kroll, J. F. (2006). Language processing
basic ones according to their age of acquisition) and/or by us- in bilingual speakers. In M. J. Traxler & M. A. Gerns-
ing multiple pivot languages (e.g., concatenating multilingual bacher (Eds.), Handbook of psycholinguistics, 2nd edi-
semantic spaces with different pivots). It would be interesting tion (pp. 967–999). Academic Press.
and vital for further research to explore these issues. Turney, P. D., & Pantel, P. (2010). From frequency to mean-
ing: Vector space models of semantics. Journal of Ar-
Acknowledgments
tificial Intelligence Research, 37, 141–188.
This research was supported by JSPS KAKENHI Grant Num- van Heuven, W. J., & Dijkstra, T. (2010). Language compre-
ber 15H02713 and SCAT Research Grant. hension in the bilingual brain: fMRI and ERP support
for psycholinguistic models. Brain Research Review,
References 64, 104–122.
Allen, D., & Conklin, K. (2014). Cross-linguistic similar- Wei, C.-P., Yang, C. C., & Lin, C.-M. (2008). A latent seman-
ity norms for Japanese-English translation equivalents. tic indexing-based approach to multilingual document
Behavior Research Methods, 46, 540–563. clustering. Decision Support System, 45, 606–620.
Bader, B. W., & Chew, P. A. (2008). Enhancing multilingual Widdows, D. (2004). Geometry and meaning. CSLI Publi-
latent semantic analysis with term alignment informa- cations.
tion. In Proceedings of the 22nd international con-

275