=Paper= {{Paper |id=Vol-1419/paper0041 |storemode=property |title=Multilingual Distributional Semantic Models: Toward a Computational Model of the Bilingual Mental Lexicon |pdfUrl=https://ceur-ws.org/Vol-1419/paper0041.pdf |volume=Vol-1419 |dblpUrl=https://dblp.org/rec/conf/eapcogsci/Utsumi15 }} ==Multilingual Distributional Semantic Models: Toward a Computational Model of the Bilingual Mental Lexicon== https://ceur-ws.org/Vol-1419/paper0041.pdf
                       Multilingual Distributional Semantic Models:
               Toward a Computational Model of the Bilingual Mental Lexicon
                                                  Akira Utsumi (utsumi@uec.ac.jp)
                               Department of Informatics, The University of Electro-Communications
                                      1-5-1, Chofugaoka, Chofushi, Tokyo 182-8585, Japan


                             Abstract                                        the method for monolingual DSMs can be directly applied
                                                                             for multilingual DSMs. Against this advantage, however,
  In this paper, we propose a novel framework of a multilin-                 the parallel-corpus-based approach to multilingual DSMs has
  gual distributional semantic model to provide a psychologi-
  cally plausible computational model of the bilingual mental                some drawbacks. One serious drawback is that the use of par-
  lexicon. In the proposed framework, a monolingual semantic                 allel corpora is not psychologically plausible. It is extremely
  space for each target language is first generated from the cor-            rare for bilinguals to be exposed to the same message in both
  responding monolingual corpus. These monolingual semantic                  languages simultaneously. Bilingual children often use dif-
  spaces are then converted into ones with common dimensions,
  which are in turn integrated into a single multilingual seman-             ferent languages according to with whom to communicate
  tic space. The language of dimensions, which we refer to as                (i.e., parents or friends) and where to communicate (i.e., at
  a pivot language, determines the type of bilinguals simulated              home or outside). It follows that bilingual lexical develop-
  by the model. We also tested the psychological plausibility
  of the proposed multilingual distributional semantic model by              ment and lexical knowledge is very unlikely to be explained
  comparing the cosine similarity computed by the model with                 by the distributional statistics obtained from a parallel cor-
  the cross-language word similarity ratings of L1 Japanese/L2               pus. The other drawback of the parallel-corpus-based ap-
  English sequential bilinguals. The result was that the bilingual           proach is a practical one; parallel corpora are generally less
  semantic space with Japanese as a pivot language, which is
  predicted to be a model for L1 Japanese/L2 English sequential              easily available and of smaller size than monolingual corpora.
  bilinguals, achieved better performance in simulating the simi-            Therefore, the multilingual semantic spaces generated from a
  larity rating data. This suggests the plausibility of the proposed         parallel corpus cannot be expected to achieve a satisfactory
  multilingual model.                                                        performance.
  Keywords: Multilingual distributional semantic model; Bilin-                  In this paper, therefore, we propose a novel method for
  gual mental lexicon; Cross-language semantic similarity                    constructing multilingual DSMs toward a cognitive model of
                                                                             the bilingual mental lexicon. To overcome the drawbacks
                         Introduction                                        of the parallel-corpus-based approach, our method does not
Distributional semantic models (henceforth, DSMs), or se-                    use any parallel corpora; it generates a monolingual semantic
mantic space models, are models for semantic representations                 space for each target language using a monolingual corpus,
of words and for the way where semantic representations are                  and then integrates the multiple monolingual spaces into a
constructed (Turney & Pantel, 2010). The semantic content                    single multilingual semantic space. The integration is car-
is represented by a high-dimensional vector, and these vec-                  ried out by aligning context words in different languages by
tors are constructed from a corpus by observing distributional               direct correspondences between words (i.e., lexical links) or
statistics of word occurrence. Despite their simplicity, DSMs                via a conceptual representation (i.e., conceptual links). This
have provided a useful framework for cognitive modeling, es-                 distinction is motivated by the psychological model of the
pecially for human semantic knowledge (e.g., Jones, Kintsch,                 bilingual mental lexicon (Kroll & Stewart, 1994). We then
& Mewhort, 2006; Landauer & Dumais, 1997).                                   test the psychological plausibility of the proposed method for
   However, these studies have explored the mental lexicon of                multilingual DSMs using the cross-language word similar-
a monolingual speaker and all DSMs used in these studies are                 ity ratings of Japanese-English bilinguals (Allen & Conklin,
monolingual. Given a recent growing interest in bilingual-                   2014). Finally, we discuss the potential ability of the pro-
ism in cognition (e.g., Bialystok, Craik, & Luk, 2012), it is                posed method to simulate a variety of findings on bilingual
quite reasonable to consider a multilingual extension of DSM                 lexicon and to provide a tool for bilingual research.
toward a cognitive model of the bilingual (or multilingual)
mental lexicon. This is what this study aims to accomplish.                                   Computational Model
   In the field of natural language processing or computa-                   In this section, we first review a general method for construct-
tional linguistics, some studies (e.g., Bader & Chew, 2008;                  ing a monolingual semantic space. We then propose a novel
Wei, Yang, & Lin, 2008; Widdows, 2004) have proposed a                       method for constructing a multilingual semantic space, by
multilingual extension of latent semantic analysis (LSA) for                 which monolingual semantic spaces for target languages are
multilingual document clustering and cross-language word                     integrated into a single multilingual semantic space.
similarity computation. What these methods have in com-                      Monolingual DSM
mon is the use of a parallel corpus. A parallel corpus is a
collection of bilingual (or multilingual) texts comprising sen-              The method for constructing semantic spaces generally com-
tences (or documents) in one language and their translations                 prises the following three steps:
in other languages. By regarding aligned texts (i.e., a pair             1. Initial matrix construction: n content words in a given cor-
of an original text and its translations) as a single document,             pus are represented as m-dimensional initial vectors whose

                                                                       270
                                     police bark, roar, …   mew, bark, …
                                                                       mouse, rat, …
                     Step 1                ㆙ᐹ ྭ䛘䜛 㬆䛟 䝛䝈䝭                   Step 2               police bark mew mouse         roar
                               ≟       ⋯     1    5    2    0       ⋯                  ≟   ⋯         1   7      2    0    ⋯ 5        ⋯
                               ⊧       ⋯     0    0    2    3       ⋯                  ⊧   ⋯         0   2      2    3    ⋯ 0        ⋯
                                       ⋯     ⋯    ⋯    ⋯    ⋯       ⋯                      ⋯         ⋯   ⋯      ⋯    ⋯    ⋯ ⋯        ⋯
    Japanese corpus

                                                                                               Step 3
                                                                                                             police bark mew mouse       roar
                                                                                               ≟         ⋯    1     7    2   0       ⋯   5      ⋯
                     Step 1                police bark mew mouse                               ⊧         ⋯    0     2    2   3       ⋯   0      ⋯
                               dog     ⋯     4    2    0 0          ⋯                                    ⋯    ⋯     ⋯    ⋯   ⋯       ⋯   ⋯      ⋯
                               cat     ⋯     0    0    5 3          ⋯                          dog       ⋯    4     2    0   0       ⋯   0      ⋯
                                       ⋯     ⋯    ⋯    ⋯ ⋯          ⋯                          cat       ⋯    0     0    5   3       ⋯   0      ⋯
    English corpus                                                                                       ⋯    ⋯     ⋯    ⋯   ⋯       ⋯   ⋯      ⋯

                                                            (a) English as a pivot language




                     Step 1                ㆙ᐹ ྭ䛘䜛 ㏣䛖 䝛䝈䝭                 Step 2
                              ≟       ⋯    1     5    0     0   ⋯                   ≟      ⋯     1       5    0     0    ⋯ 0         ⋯
                              ⊧       ⋯    0     0    4     3   ⋯                   ⊧      ⋯     0       0    4     3    ⋯ 3         ⋯
                                      ⋯    ⋯     ⋯    ⋯     ⋯   ⋯                          ⋯     ⋯       ⋯    ⋯     ⋯    ⋯ ⋯         ⋯
   Japanese corpus


                                                                                                              ≟     ⋯    1   5       0    0     ⋯   0   ⋯
                                                                                                              ⊧     ⋯    0   0       4    3     ⋯   3   ⋯
                                                                                               Step 3               ⋯    ⋯   ⋯       ⋯    ⋯     ⋯   ⋯   ⋯
                                                                                                             dog    ⋯    4   2       0    0     ⋯   0   ⋯
                                                                                                             cat    ⋯    0   0       6    3     ⋯   0   ⋯
                                                                                                                    ⋯    ⋯   ⋯       ⋯    ⋯     ⋯   ⋯   ⋯



                     Step 1               police bark chase mouse        Step 2
                              dog     ⋯    4     2    0     0   ⋯                   dog    ⋯     4       2    0     0    ⋯ 0         ⋯
                              cat     ⋯    0     0    6     3   ⋯                   cat    ⋯     0       0    6     3    ⋯ 0         ⋯
                                      ⋯    ⋯     ⋯    ⋯     ⋯   ⋯                          ⋯     ⋯       ⋯    ⋯     ⋯    ⋯ ⋯         ⋯
   English corpus

                                                            (b) Concepts as a pivot language

Figure 1: A rough sketch of the multilingual distributional semantic model proposed in this paper: The case of English-Japanese
bilingual semantic space.


   elements are frequencies in a linguistic context. As a re-                       “documents-as-contexts” method, an element a i j is deter-
   sult, an n by m matrix A = (a i j ) is constructed using n word                  mined as the frequency of a word w i in a document d j (i.e.,
   vectors as rows.                                                                 the number of times a word w i occurs in a document d j ). On
                                                                                    the other hand, in a “words-as-contexts” method, a i j is calcu-
2. Weighting: The elements of the matrix A are weighted.                            lated as the cooccurrence frequency of a target word w i and a
3. Smoothing: The dimension of the row vectors of A is re-                          context word w j within a certain context (i.e., the number of
   duced from the initial dimension m to r.                                         times two words wi and w j cooccur in a context). As a context
                                                                                    for counting cooccurrence, we use a “window” spanning two
As a result, an r-dimensional semantic space including n                            words on either side of the target word. Note that the existing
words is generated.                                                                 methods for multilingual LSA often employ a documents-as-
  For initial matrix construction in Step 1, two popular meth-                      contexts matrix by regarding aligned texts in a parallel corpus
ods are used for computing the elements a i j of A. In a                            as a single document. On the other hand, our method does not


                                                                              271
use a parallel corpus, and thus we apply a words-as-contexts              that the cooccurrence frequency of 犬 and police is counted as
method to initial matrix construction, as we will explain in              1. If a context word has more than one equivalent in the pivot
the next subsection.                                                      language, the pseudo-cooccurrence frequencies for the other
   For Step 2, various weighting methods have been proposed.              equivalents are also counted in the same way. For example,
Two popular methods are entropy-based tf-idf weighting                    because the context word 吠える has at least two equivalents
(Landauer & Dumais, 1997) and PPMI (positive pointwise                    bark and roar, the cooccurrence frequency between 犬 and
mutual information) weighting (Bullinaria & Levy, 2007;                   roar as well as between 犬 and bark is counted as 2. Note
Recchia & Jones, 2009). In this paper, we use a PPMI weight-              that the context word bark is also the equivalent of the con-
ing method because it is suitable for words-as-contexts ma-               text word 鳴く and thus the pseudo-cooccurrence frequency
trices and generally achieves good performance (Bullinaria                between 犬 and bark is 7 (= 5 from 吠える + 2 from 鳴く).
& Levy, 2007; Recchia & Jones, 2009). PPMI is based on                    Finally, in the last step (i.e., Step 3), the converted matrix for
the pointwise mutual information (PMI) and replaces nega-                 Japanese and the original cooccurrence matrix for English in
tive PMI values with zero. The last step (Step 3) of smooth-              Figure 1 (a) (or the converted English matrix in Figure 1 (b))
ing is optional and usually conducted using singular value de-            are concatenated into a single matrix expressing an English-
composition (SVD). In this paper, we do not smooth the ma-                Japanese bilingual semantic space.
trix because PPMI semantic spaces generally achieves good                    In this framework of multilingual DSM, the pivot language
performance even though smoothing is not applied (Recchia                 determines the type of bilinguals for which the constructed
& Jones, 2009).                                                           semantic space is suitable. Because all the words in all tar-
                                                                          get languages are represented through a pivot language, the
Multilingual DSM                                                          pivot language can be regarded as the dominant language of
The basic idea underlying our multilingual DSM method                     bilinguals (or multilinguals). Hence, the multilingual DSM
is that word cooccurrence (i.e., words-as-contexts) matrices              generated by this method can be regarded as a model of the
generated for each target language using a monolingual cor-               mental lexicon of sequential bilinguals with a pivot language
pus are converted into cooccurrence matrices with the same                as L1. For example, the multilingual DSM with English as a
set of context words; as a result, word vectors in different lan-         pivot language shown in Figure 1 (a) is expected to be a cog-
guages are placed in the single semantic space with common                nitive model for L1 English/L2 Japanese bilinguals. When
dimensions. The language of context words can be selected                 concepts are used as a pivot language as in the case of Fig-
from target languages or other “language” representing con-               ure 1 (b), the resulting DSM is assumed to be a model for
cepts. We refer to this language as a pivot language.                     simultaneous bilinguals, who are exposed to bilingual input
   Figure 1 illustrates our idea of the multilingual DSM                  from birth. This assumption is reasonable because simultane-
method in the case of a Japanese-English bilingual DSM. The               ous bilinguals do not have a dominant language and lexical
first step is to generate a word cooccurrence matrix for each             development in two languages proceeds indifferently through
target language using a monolingual corpus. In Figure 1 (a),              concepts.
two cooccurrence matrices, one for Japanese and the other                    In the explanation given above, we use a Japanese-English
for English, are constructed separately. For example, the                 bilingual DSM as an example, but our method is not spe-
Japanese cooccurrence matrix expresses that the target word               cific to bilingualism. Formally, given k target languages
犬 cooccurs with the context word 警察 once and with the                     L1 , L2 , · · · , Lk , the method for constructing a multilingual se-
context word 吠える five times.                                              mantic space on the basis of our idea can be described in the
   In the second step, a monolingual cooccurrence matrix                  following steps:
for a target language is converted into a matrix express-
ing a pseudo-cooccurrence between target words in the tar-            1. The cooccurrence matrices A 11 , A22 , · · · Akk for target lan-
get language and context words in the pivot language. As                 guages L1 , L2 , · · · , Lk are constructed from the correspond-
shown in Figure 1 (a), when English is a pivot language, the             ing monolingual corpora by the method for constructing
Japanese cooccurrence matrix must be converted into the ma-              monolingual DSMs.
trix with English context words, while the English cooccur-
                                                                      2. Using conversion matrices D ip (1 ≤ i ≤ k) from a target
rence matrix does not need to be converted. Conversely, when
                                                                         language Li into a pivot language L p , the cooccurrence ma-
Japanese is a pivot language, an English cooccurrence ma-
                                                                         trices Aii generated above are converted into the matrices
trix is converted but a Japanese matrix is not. Furthermore,
                                                                         Aip with the same dimensions.
when a set of concepts is used as a pivot language as shown
in Figure 1 (b), both cooccurrence matrices are converted into                                       Aip = Aii × Dip                       (1)
pseudo-cooccurrence matrices with concepts as contexts.
   The matrix conversion in Step 2 can be done by translating               Note that, if Li is a pivot language L p , then Aip = Aii .
context words into the pivot language using a dictionary or
other lexical database, and by counting the “pseudo” cooc-            3. All the converted matrices A 1p , A2p , · · · Akp are concate-
currence frequency between a word in the target language                 nated into a single matrix A.
and a context word in the pivot language. For example, as                                           ⎛       ⎞
shown in Figure 1 (a), the context word 警察 has one trans-                                              A1p
lation equivalent police in English, and the cooccurrence fre-                                 A = ⎝ ... ⎠                          (2)
quency between the target word 犬 and 警察 is 1. It follows                                               Akp

                                                                    272
  The resulting matrix A represents a multilingual semantic                and target languages. In this experiment, we used as a mono-
  space.                                                                   lingual corpus Japanese newspaper articles (i.e., six years’
                                                                           worth of Mainichi newspaper articles) with 41.2M word to-
   At Step 2, we use a conversion (or term alignment) matrix               kens, and the written and non-fiction parts of the British Na-
Dip to translate context words into a pivot language. The (s,t)            tional Corpus with 54.7M word tokens. In order to deter-
entry of the matrix D ip is 1 (or other nonzero value) if a word           mine the vocabulary of the semantic space, we performed the
wt in the pivot language L p is a translation of a word w s in the         widely used preprocessing steps, namely stopword removal
language Li and otherwise 0.                                               and lemmatization. Concerning a dictionary, English Word-
   Weighting (i.e., Step 2 of the monolingual DSM presented                Net 3.0 and Japanese WordNet 1.1 were used. WordNet is
in the last section) can be applied either after Step 2 of the             not a dictionary, but it can serve as a dictionary by connecting
above algorithm or after Step 3 of the algorithm. Weight-                  words in different languages via synsets. WordNet synsets are
ing after Step 2 implies that the converted matrices A ip                  sets of cognitive synonyms, each expressing a distinct con-
are weighted before they are concatenated into A, while                    cept. Synsets provide an additional merit in using WordNet
weighting after Step 3 implies that the concatenated matrix                in that synsets can be used as a pivot language representing
A is weighted. Note that some weighting methods such                       concepts (or more precisely a pivot concept). Japanese and
as entropy-based one give the same matrix A regardless of                  English words to be included for bilingual semantic spaces
whether weighting is applied before or after matrix concate-               were selected so that they can be translated into each other
nation, because in those methods word vectors (i.e., row vec-              via WordNet. In other words, each of these Japanese words
tors of Aip ) are weighted independently of each other. In the             shares at least one synset with at least one of these English
case of PPMI weighting, however, different matrices A are                  words. As a result, 22,416 Japanese words and 18,463 En-
generated according to the timing of weighting.                            glish words were selected as the vocabulary of the bilingual
                                                                           semantic space. These Japanese and English words are re-
                Evaluation Experiment                                      lated via 23,421 synsets. Therefore, the size of the monolin-
Test Data                                                                  gual cooccurrence matrix in Step 1 of the proposed algorithm
                                                                           was 22,416 × 22,416 for Japanese and 18,463 × 18,463 for
As test data for evaluating multilingual DSMs, we used the                 English.
cross-linguistic similarity norms for Japanese-English trans-
lations provided by Allen and Conklin (2014). This data com-               Method
prises semantic similarity and phonological similarity ratings             First of all, using the corpora and WordNet mentioned above,
of 193 Japanese-English word pairs and other relevant mea-                 we constructed six bilingual semantic spaces from all com-
sures. Among these ratings, we used the semantic similarity                binations of three pivot languages (Japanese, English, and
rating on a 5-point scale ranging from 1 to 5, and compared                synset) and two timing of weighting (before or after matrix
it with the cosine similarity computed by multilingual DSMs                concatenation).
to evaluate their modeling performance.                                       Using these six semantic spaces, we computed the co-
   The 193 word pairs are divided into 98 cognates and 95                  sine similarity of each pair of words in the test data. Note
noncognates. Cognates are words in different languages that                that 12 out of 193 word pairs of Allen and Conklin’s (2014)
share both form and meaning. For example, the Japanese                     data did not exist in the bilingual semantic spaces, and thus
word カメラ /kamera/ and the English word “camera” are                        the remaining 181 word pairs (including 89 cognates and 92
cognates. The cognates used in Allen and Conklin’s (2014)                  noncognates) were used for similarity computation.
study are all loanwords in Japanese, words borrowed from                      The performance of each bilingual semantic space was
English and written in a separate script, katakana. Noncog-                measured by Spearman’s correlation coefficient between the
nates (e.g., 希望 and “hope”) have the same meaning but do                   computed cosine values and the semantic similarity ratings in
not share form. Cognates have been central to psycholinguis-               the test data.
tic research on bilingual language processing because they
provide an effective way in examining an essential question                Prediction
of whether bilinguals selectively activate a single language or            If a bilingual semantic space is a plausible model of the L1
simultaneously both languages (Dijkstra, 2007).                            Japanese/L2 English bilingual’s mental lexicon, the correla-
   Allen and Conklin’s (2014) semantic similarity norm was                 tion coefficient is expected to take a high positive value. Fur-
collected from the native speakers of Japanese who also speak              thermore, it is predicted that the semantic space with Japanese
English as a second language, namely L1 Japanese/L2 En-                    as a pivot language shows a higher correlation than that with
glish speakers. Hence, this semantic similarity data can be                an English pivot.
regarded as reflecting the mental lexicon of sequential bilin-
guals whose L1 is Japanese and L2 is English.                                                         Result
                                                                           Table 1 shows the correlation coefficients between the cosine
Materials for Multilingual DSM                                             similarity computed by the bilingual semantic spaces and the
As we explained before, the multilingual DSM proposed in                   semantic similarity ratings of the test data. First of all, the
this paper requires two kinds of language resources, namely a              correlation coefficients for all pairs were moderately high
monolingual corpus for each target language and a dictionary               and statistically significant. This indicates that the proposed
(or lexical database) for converting between a pivot language              multilingual DSM framework provides a plausible model of

                                                                     273
Table 1: Correlation coefficients between the cosine similar-       Median rank         ***                 ***
ity computed by the bilingual semantic spaces and the seman-               104             L2 → L1                            J→E
tic similarity ratings by Allen and Conklin (2014).                                                     L2 → L1               E→J
                                                                           103
                     All pairs Cognates        Noncognates
  Pivot language (n = 181) (n = 89)             (n = 92)                   102   L1 → L2
                                                                                                               L1 → L2
  Weighting BEFORE concatenation
     Japanese       .294∗∗∗     .284∗∗         .316∗∗                      10
                         ∗∗∗
                                                                                                                          ***p < .001
     English        .247        .221∗          .282∗∗                       1
     Synset         .291∗∗∗     .290∗∗         .304∗∗                             Japanese pivot        English pivot
  Weighting AFTER concatenation
                                                                         Figure 2: Median ranks of the target words in the ordering of
     Japanese       .342∗∗∗     .329∗∗         .368∗∗∗                   cosine similarity to the prime words for the 181 word pairs
     English        .328 ∗∗∗    .311∗∗         .363∗∗∗                   used in the evaluation experiment. J → E (E → J) denotes
     Synset         .377 ∗∗∗    .395∗∗∗        .371∗∗∗                   that Japanese (English) words in the pairs are used as primes
  *p < .05. **p < .01. ***p < .001                                       and their paired English (Japanese) words are targets. Sim-
                                                                         ilarly, L1 → L2 (L2 → L1) denotes L1 (L2) primes and L2
                                                                         (L1) targets, assuming that the pivot language of the multilin-
the bilingual mental lexicon. In addition, the semantic space            gual DSM plays a role of L1. All the semantic spaces used
with Japanese as a pivot language achieved higher correla-               here are weighted before matrix concatenation.
tions than those of the semantic space with an English pivot,
regardless of whether they were calculated for all pairs or
cognates/noncognates. This result is consistent with the pre-            Jacquet, 2004).
diction mentioned earlier, and thus suggests that the pivot
language of the multilingual DSM can correctly model the                                             Discussion
dominant language of sequential bilinguals. The correlation              In this paper, we have proposed a novel method for construct-
coefficients for the DSM with synsets as a pivot language,               ing multilingual DSMs to provide a psychologically plausible
which is expected to model a mental lexicon of simultaneous              computational model of the bilingual (or multilingual) men-
bilinguals, did not differ from (in the case of weighting be-            tal lexicon. Its plausibility is tested and justified by compar-
fore concatenation) or were slightly higher than (in the case            ing the cosine similarity computed by the multilingual DSMs
of weighting after concatenation) those of the DSM with a                with the semantic similarity data collected from Japanese-
Japanese pivot. We do not have a reasonable explanation                  English bilinguals. In particular, the proposed method can
of this result at the moment, but this result may reflect the            provide a model that can discriminate between sequential
fact that, as bilinguals become more proficient in L2, their L2          bilinguals with different L1. Indeed, the evaluation experi-
lexical knowledge is learned via a conceptual representation             ment demonstrated that it can generate a semantic space ap-
(Kroll & Stewart, 1994).                                                 propriate for L1 Japanese/L2 English sequential bilinguals.
   Comparison of the results between cognate and noncognate              However, the experiment presented in this paper is not so
pairs shows that our proposed multilingual DSMs were more                comprehensive and rather preliminary. Further justification
advantageous to noncognates. One possible reason would be                of the modeling performance of the multilingual DSM must
due to word frequency effect; Japanese cognates are generally            await further research, but in this section we discuss the po-
less frequent than noncognates (Allen & Conklin, 2014), and              tential ability of the multilingual DSM to explain other psy-
thus the cooccurrence statistics for cognates is less sufficient         cholinguistic findings on bilingual lexical processing.
for plausible vector representations.                                       Research on bilingual lexical processing has demonstrated
   For the stage at which weighting is applied to a cooccur-             that lexical access in bilinguals is language nonselective (van
rence matrix, weighting after concatenation achieved better              Heuven & Dijkstra, 2010; Schwartz & Kroll, 2006). In other
performance than weighting before concatenation. This result             words, lexical representations in both languages are activated
is not surprising because PPMI weighting requires to estimate            in parallel regardless of which language is being processed.
the probability of context words across all target words, but            This is evidenced by the cross-language priming paradigm
weighting before concatenation computes the probability of               in which a prime word in one language facilitates a target
context words separately for each language. However, it is an            word in another language. Particularly interesting is the well-
open question whether weighting before or after concatena-               known finding that primes in L1 obviously facilitate targets
tion is plausible as a model of the bilingual mental lexicon.            in L2, but L2 primes do not reliably facilitate L1 targets (e.g.,
Although the proposed algorithm for constructing multilin-               Jiang & Forster, 2001; Schwartz & Kroll, 2006). This asym-
gual DSMs is not a psychological process model, weighting                metry effect may be able to be explained by the multilingual
after concatenation may lend support to the view of a sin-               DSM proposed in this paper. One reasonable way to do this is
gle integrated bilingual lexicon, rather than the view of two            to employ the rank of the target word under the ordering im-
separate lexicons (for a review of two views, see French &               posed by the cosine similarity to the prime word as a measure

                                                                   274
for the degree of its priming effects (e.g., Griffiths, Steyvers,                ference on computational linguistics (COLING-2008)
& Tenenbaum, 2007). The rationale behind this assumption                         (pp. 49–56).
is that the target word which ranks higher by the cosine sim-             Bialystok, E. (2009). Bilingualism: The good, the bad, and
ilarity to the prime word is more activated and accessible by                    the indifferent. Bilingualism: Language and Cogni-
the prime word. For example, if among all the words in the                       tion, 12, 3–11.
semantic space the target word ranks first by the cosine simi-            Bialystok, E., Craik, F. I., & Luk, G. (2012). Bilingualism:
larity to the prime word, it seems to suggest that the target is                 Consequences for mind and brain. Trends in Cognitive
activated most preferably by the prime. On the other hand, if                    Science, 16, 240–250.
the target word ranks very low even though the cosine value
                                                                          Bullinaria, J. A., & Levy, J. P. (2007). Extracting seman-
does not differ from the above case, it is less likely to be ac-
                                                                                 tic representations from word co-occurrence statistics:
tivated by the prime. Figure 2 shows the median rank of the
target words obtained by applying this methodology to the                        A computational study. Behavior Research Methods,
181 word pairs used in the evaluation experiment. The re-                        39(3), 510–526.
sult is consistent with the asymmetry effect of cross-language            Dijkstra, T. (2007). The multilingual lexicon. In M. Gaskell
priming. The Wilcoxin signed-rank test indicated that the me-                    (Ed.), The Oxford handbook of psycholinguistics (pp.
dian rank in the case of L1 prime and L2 target (i.e., J → E for                 251–265). Cambridge: Oxford University Press.
the DSM with the Japanese pivot and E → J for the DSM with                French, R. M., & Jacquet, M. (2004). Understanding bilin-
the English pivot) is significantly higher than that of L2 prime                 gual memory: Models and data. Trends in Cognitive
and L1 target.                                                                   Science, 8, 87–93.
   Another well-known finding on multilingual lexical pro-                Griffiths, T., Steyvers, M., & Tenenbaum, J. (2007). Topics
cessing is that bilinguals generally perform more poorly on                      in semantic representation. Psychological Review, 114,
lexical tasks in both languages than monolinguals (Bialystok,                    211–244.
2009; Bialystok et al., 2012). This disadvantage of bilinguals            Jiang, N., & Forster, K. I. (2001). Cross-language priming
is considered to be due to the interference from the other lan-                  asymmetries in lexical decision and episodic recogni-
guage. This interference effect can also be possibly explained                   tion. Journal of Memory and Language, 44, 32–51.
by comparing the median rank of word pairs in the same lan-               Jones, M. N., Kintsch, W., & Mewhort, D. J. (2006). High-
guage between the multilingual and monolingual DSMs. For                         dimensional semantic space accounts of priming. Jour-
example, we computed the median rank over 163 Japanese
                                                                                 nal of Memory and Language, 55, 534–552.
word association pairs (chosen from the Japanese word as-
                                                                          Kroll, J. F., & Stewart, E. (1994). Category interference in
sociation norm “Renso Kijunhyo”) by means of multilingual
and monolingual DSMs. The result is that, as predicted, the                      translation and picture naming: Evidence for asymmet-
median rank of the monolingual DSM (38.0) is higher than                         ric connections between bilingual memory representa-
those of the multilingual DSMs (46.0 for the English pivot,                      tions. Journal of Memory and Language, 33, 149–174.
p < .001; 56.0 for the Japanese pivot, p < .01).                          Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s
   From the above discussion, it is clear that the multilingual                  problem: The latent semantic analysis theory of the ac-
DSM proposed in this paper may have the potential to sim-                        quisition, induction, and representation of knowledge.
ulate several empirical findings on bilingual lexical process-                   Psychological Review, 104, 211–240.
ing. In addition, the proposed DSM framework may be able                  Recchia, G., & Jones, M. N. (2009). More data trumps
to simulate the behaviors of a variety of bilinguals with dif-                   smarter algorithms: Comparing pointwise mutual in-
ferent degrees of language proficiency and with different de-                    formation with latent semantic analysis. Behavior Re-
velopmental patterns. This may be realized, for example, by                      search Methods, 41, 647–656.
controlling context words (e.g., reducing context words into              Schwartz, A. I., & Kroll, J. F. (2006). Language processing
basic ones according to their age of acquisition) and/or by us-                  in bilingual speakers. In M. J. Traxler & M. A. Gerns-
ing multiple pivot languages (e.g., concatenating multilingual                   bacher (Eds.), Handbook of psycholinguistics, 2nd edi-
semantic spaces with different pivots). It would be interesting                  tion (pp. 967–999). Academic Press.
and vital for further research to explore these issues.                   Turney, P. D., & Pantel, P. (2010). From frequency to mean-
                                                                                 ing: Vector space models of semantics. Journal of Ar-
                    Acknowledgments
                                                                                 tificial Intelligence Research, 37, 141–188.
This research was supported by JSPS KAKENHI Grant Num-                    van Heuven, W. J., & Dijkstra, T. (2010). Language compre-
ber 15H02713 and SCAT Research Grant.                                            hension in the bilingual brain: fMRI and ERP support
                                                                                 for psycholinguistic models. Brain Research Review,
                         References                                              64, 104–122.
Allen, D., & Conklin, K. (2014). Cross-linguistic similar-                Wei, C.-P., Yang, C. C., & Lin, C.-M. (2008). A latent seman-
      ity norms for Japanese-English translation equivalents.                    tic indexing-based approach to multilingual document
      Behavior Research Methods, 46, 540–563.                                    clustering. Decision Support System, 45, 606–620.
Bader, B. W., & Chew, P. A. (2008). Enhancing multilingual                Widdows, D. (2004). Geometry and meaning. CSLI Publi-
      latent semantic analysis with term alignment informa-                      cations.
      tion. In Proceedings of the 22nd international con-

                                                                    275