Towards a Vecsigrafo: Portable Semantics in Knowledge-based Text Analytics Ronald Denaux1 and Jose Manuel Gómez-Pérez1 Expert System, Madrid, Spain, {rdenaux,jmgomez}@expertsystem.com Abstract. The proliferation of knowledge graphs and recent advances in Artificial Intelligence have raised great expectations related to the combination of symbolic and distributional semantics in cognitive tasks. This is particularly the case of knowledge-based approaches to natural language processing as near-human symbolic understanding and expla- nation rely on expressive structured knowledge representations that tend to be labor-intensive, brittle and biased. This paper reports research ad- dressing such limitations by capturing as embeddings the semantics of both words and concepts in large document corpora. We show how the emerging knowledge representation – our Vecsigrafo – can drive semantic portability capabilities that are not easily achieved by either word embed- dings or knowledge graphs on their own, supporting curation, overcoming modeling gaps, enabling interlinking and multilingualism. In doing so, we also share our experiences and lessons learned and propose new methods that provide insight on the quality of such embeddings. 1 Introduction For several decades, semantic systems have been predominantly developed around knowledge graphs (KGs) (and their variants: semantic networks and ontolo- gies) at different degrees of expressivity. Language technologies, particularly knowledge-based text analytics have heavily relied on such structured knowl- edge. Through the explicit representation of knowledge in well-formed, logically sound ways, KGs provide rich, expressive and actionable descriptions of the domain of interest through logical deduction and inference, and support logi- cal explanations of reasoning outcomes. On the downside, KGs can be costly to produce leading to scalability issues, as they require a considerable amount of well-trained human labor [9] to manually encode knowledge in the required formats. Capturing the knowledge from the crowd has been suggested[1], but scalability is proportional to the number of humans contributing to such tasks. Furthermore, the involvement of humans in the modelling activities introduces a bias in how the domain is represented (its depth, breadth and focus) which can lead to brittle systems that only work in a limited context, hinders generality and may require continuous supervision and curation. In parallel, the last decade has witnessed a shift towards statistical meth- ods due to the increasing availability of raw data and cheap computing power. Statistical approaches to text understanding have proved to be powerful and convenient in many linguistic tasks, such as part-of-speech tagging, dependency parsing, and others. However, these methods are also limited and cannot be con- sidered as a replacement for knowledge-based text analytics. E.g. humans seek causal explanation which are hard to provide based on statistical methods, as they are driven by statistical induction rather than logical deduction. Recent results in the field of distributional semantics [13] have shown promis- ing ways to learn features from text that can complement the knowledge already captured explicitly in KGs. Embeddings provide a compact and portable rep- resentation of words and their associated semantics that stems directly from a document corpus. Here, the notion of semantic portability refers to the capabil- ity to capture as an information artifact (a vector) the semantics of a word from its occurrences in the corpus and how such artifact enables that meaning to be merged with other forms of (possibly structured) knowledge representation. At Expert System, we make extensive use of formal knowledge representa- tions, including semantic networks and linguistic rule bases, as these technolo- gies have shown higher accuracy figures if properly fine tuned, are more resilient against scarce training data and tend to offer better inspection capabilities al- lowing us to debug and adapt when necessary. However, this comes at the cost of considerable human effort by linguists and knowledge engineers for various KG curation tasks: continuous bug fixing, keeping resources up to date (e.g. Barack Obama is/was the president of the US ), and adding extensions for new domains or terms of interests (Cybersecurity, Blockchain) for each language supported. We argue that semantic portability is a key feature to facilitate KG-curation tasks, deal with modeling bias, and enable interlinking of KGs. In this paper, we present research and experiences evaluating, adopting and adapting approaches for generating and applying word and concept embeddings. Among others, we argue that using embeddings in practice (other than as inputs for further deep learning systems) is not trivial as there is a lack of evaluation tools and best practices other than manual inspection, which we are aiming to minimize in the first place. Therefore, we describe some methods we have de- veloped to check our intuitions about these systems and establish reproducible good practices. Our main contributions are: i) a novel method for the generation of semantically portable, joint word and concept embeddings and their appli- cations in hybrid knowledge-based text analytics (Sect. 3), ii) inspection and evaluation methods that have proved useful for assessing the quality and fitness of embeddings for the purpose of our research (Sect. 4). This paper also applies the embedding generation to Expert System’s case, resulting in a Vecsigrafo and describes a practical application for the Vecsigrafo (Sect. 5). 2 Background NLP systems which perform Knowledge-based Text Analysis rely on a Knowledge Graph (KG) as its point of reference for performing analyses. Good KGs for text analysis represent concepts and entities, their semantic and grammatical rela- tions, and their lexical forms enabling the system to recognise and disambiguate those concepts. KGs used in practice include DBpedia, Word- and BabelNet. The Knowledge Graph is used by a text analysis engine which performs NLP tasks such as tokenization, part-of-speech tagging, etc. Furthermore, many text analysis engines can use the knowledge encoded in the KG to perform word sense disambiguation (WSD), in which particular senses are ascribed to words in the text. This can be done for simple cases, such as knowing whether apple refers to the company or the fruit; but this extends to more subtle differences such as distinguishing between redeem as “exchanging a voucher” or as “paying off a debt” or its religious sense. The sense disambiguation results can then be used to improve further NLP tasks such as categorization and information extraction; this can be done either using machine learning or rule-based approaches. At Expert System our KG is called Sensigrafo1 that relates concepts (internally called syncons) to each other via an extensible set of relations (e.g. hypernym, synonym, meronym), to their lemmas (base forms of verbs, nouns, etc.) and to a set of around 400 topic domains (e.g. clothing, biology, law, electron- ics). For historical, strategic and organizational reasons we maintain different Sensigrafos for the 14 languages we support natively; we have partial mappings at the conceptual level between some (but not all) of our Sensigrafos as pro- ducing and maintaining these mappings requires prohibitive amounts of human effort. The Sensigrafo is used by Cogito, our text analysis engine which per- forms word sense disambiguation with an accuracy close to 90% for languages with mature native support. On top of Cogito we have rule languages which allow us to write (or learn) custom categorization and extraction rules based on the linguistic characteristics (including disambiguated concepts) of documents. 2.1 Word and KG embeddings Various approaches for statistical, corpus-based models of semantic representa- tions have been proposed over the last two decades [3]. Traditionally encoded in sparse pointwise mutual information matrices, recent approaches generate embeddings: dense, low-dimensional spaces that capture similar information as the pointwise mutual information matrices. In particular, the word2vec system based on the skip-gram with negative-sampling (SGNS) algorithm [13] provides an efficient way to generate high-quality word embeddings from large corpora. Although SGNS is defined in terms of sequences of words, subsequent algo- rithms such as GloVe [15] and Swivel [18] have shown that similar (or better) results can be achieved by learning the vectors from sparse co-occurrence matri- ces. These approaches make it easier to generalise over different types of linguistic units (e.g. concepts and words) as we show in Sect. 3, which is harder to do with SGNS since it expects non-overlapping sequences of linguistic units. Standard corpus-based word embeddings do not encode KG-concepts due to ambiguity of words in natural language. One approach to resolve this is to generate sense embeddings [5,10], whereby tags are added to the words in the 1 You can think of Sensigrafo as a highly curated version of WordNet corpus to indicate the sense and part-of-speech of the word. While this addresses ambiguity of individual words, the resulting embeddings do not directly provide embeddings for KG-concepts, only to various synonymous word-sense pairs2 . Approaches have been also proposed for learning concept embeddings from existing Knowledge Graphs [4,8,16]. Compared to corpus-based embeddings, We find that KG-derived embeddings are not yet as useful for our purposes because: (i) KG embeddings encode knowledge which is relatively sparse (compared to large text corpora); (ii) the original KG already is structured and is easy to query and inspect; (iii) corpus-based models provide a bottom-up view of the linguistic units and reflect how language is used in practice, as opposed to KGs, which pro- vide a top-down view as they have usually been created by human experts (e.g. Sensigrafo, which has been hand-curated by linguists). Other proposed methods can generate joint embeddings of words and KG entities [20]. However, they mix bottom-up and top-down views3 which we want to avoid. Hence, in Section 3, we propose a corpus-based, joint word-concept embedding generation method. 2.2 Existing Methods and Practices for Evaluating Embeddings One evaluation method used in the literature and open-sourced tools relies on manual inspection: papers provide examples using a top-n of similar words for a given input word to show semantic clustering. Similarly, visual inspection is provided in the form of t-SNE or PCA dimensionality reduction projections into two dimensions which also show semantic clustering; an example of a tool that can be used for this purpose is the Embeddings Projector [19], distributed as part of Tensorflow. However, these tools restrict the view to the neighbourhood of a single point or area of the embedding space and make it hard to understand the overall behaviour of the embeddings. In Sections 4.2 and 4.3 we introduce a plot which addresses this issue and show concrete applications. Intrinsic evaluation methods are used to try to understand the overall quality of embeddings. In the case of word embeddings, a few papers employ such methods to provide systematic evaluations of various models [2,12]. As part of these evaluations [2] defines 5 types of tasks (and lists available test-sets) that compare embedding predictions to human-rated datasets. From the five identified types, two (semantic relatedness and analogy) are consistently used in the recent literature, while the other three (synonym detection, concept categorization and selectional preference) are rarely used. Such intrinsic evaluations define specific, somewhat artificial tasks which are not end-goals of the embeddings. Typically, embeddings are generated to be used within larger (typically machine-learning based) NLP systems [17]; improvements on those NLP systems by using the embeddings provide extrinsic evaluations, which are rarely mentioned in the literature. Schnabel et al. [17] show that good results in intrinsic evaluations do not guarantee better extrinsic results. In Sections 4 and 5 we present both intrinsic and extrinsic evaluations. 2 E.g. word-sense pairs appleN2 and Malus pumilaN1 have separate embeddings, but the concept for apple tree they represent has no embedding. 3 By aligning a TransE-like knowledge model and a SGNS-like text model Fig. 1: Process for Vecsigrafo generation from a text corpus. 3 Corpus-based Joint Concept-Word Embeddings In order to build hybrid systems which can use both bottom-up (corpus-based) embeddings and top-down (KG) knowledge, we propose to generate embeddings which share the same vocabulary as the Knowledge Graphs. This means generat- ing embeddings for knowledge items represented in the KG such as concepts and surface forms (words and expressions) associated to the concepts in the KG4 . The overall process for learning joint word and concept embeddings is de- picted in Figure 1, we start with a text corpus on which we apply tokeniza- tion and word sense disambiguation (WSD) to generate a disambiguated corpus, which is a sequence of lexical entries (words, or multiword expressions). Some of the lexical entries are annotated with a particular sense (concept) formalised in the KG. To generate embeddings for both senses and lexical entries, we need to correctly handle lexical entries which are associated to a sense in the KG, hence we extend the matrix construction phase of the Swivel [18] algorithm to generate a co-occurrence matrix which includes both lexical forms and senses as part of the vocabulary as explained below. Then we apply the training phase of a slightly modified version of the Swivel algorithm to learn the embeddings for the vocabulary; the modification is the addition of a vector regularization term as suggested in [7] (equation 5) which aims to reduce the distance between the column and row (i.e. focus and context) vectors for all vocabulary elements. Modified Swivel Co-occurrence Matrix Construction The main mod- ification from standard Swivel5 is that in our case, each token in the corpus is not a single word, but a lexical entry with an optional KG-concept annotation. Both lexical entries and KG-concepts need to be taken into account when calculating the co-occurrence matrix. Formally, the co-occurence matrix X ∈ RV ∗V contains the co-occurrence counts found over a corpus, where V ⊂ L∪C is the vocabulary, which is a conjunction of lexical forms L and KG-concepts C. Xij = #(vi , vj ) is the frequency of lexical entries or concepts vi and vj co-occurring within a cer- tain window size w. Note that Xij ∈ R, since this enables us to use a dynamic context window [12], weighting the co-occurrence of tokens according to their distance within the sequences.6 . 4 In RDF, this typically means values for rdfs:label properties, or words and ex- pressions encoded as ontolex:LexicalEntry instances using the lexicon model for ontologies (see https://www.w3.org/2016/05/ontolex/). 5 As implemented in https://github.com/tensorflow/models/tree/master/swivel 6 We use a modified harmonic function h(n) = 1/n for n > 0 and h(0) = 1 which covers the case where a token has both a lexical form and a concept. This is the 3.1 Vecsigrafo Generation at Expert System We follow the process de- Table 1: Size of vocabularies(×1000), for English scribed in Fig. 1 by adapt- and Spanish in Sensi- and Vecsigrafos ing it to Expert System’s technology stack (described in Sect. 2). As input we have En-grafo Es-grafo used the UN corpus [21], Vocab Element Sensi- Vecsi- Sensi- Vecsi- specifically the alignment be- Lemmas 398 80 268 91 tween English and Spanish KG-Concepts 300 67 226 52 corpora, which has about 22 Total 698 147 474 143 million lines for each lan- guage. We used Cogito to perform tokenization, lemmatization and word-sense disambiguation. We use tokens at the disambiguation level, this means that some tokens correspond to single words, while others correspond to multi-word expres- sions (when they can be related to a Sensigrafo concept). Furthermore, we only kept lemmas (or base-form) as our lexical entries since the Sensigrafo only as- sociates concepts to lemmas, not the various morphological forms in which they can appear in natural language; this reduces the size of the vocabulary. Also, we perform some filtering of the tokens by removing stopwords. We trained two vecsigrafos for Spanish and English for 80 epochs. The resulting vecsigrafos are summarised and compared to the corresponding Sensigrafos in Table 1. As the table shows, the UN corpus only covers between 20 and 34 % of the lemmas and concepts in the respective Sensigrafos. 4 Evaluating Vecsigrafos In general, we find that generating embeddings is relatively easy by using and adapting existing tools. However, evaluating the quality of the resulting embed- dings is not as straightforward. We have used both manual and visual inspection tools but ran into the issues discussed in Sect. 2.2. In particular, the Embeddings Projector [19] is limited to displaying 10K points, hence we can only visualize a part of the vecsigrafo at a time. By combining information from Sensigrafo and exploring areas of the space with this projector, we have been able to find some vocabulary elements which were “out of context”; this was typically caused either by a limitation of the corpus, or by issues with our language processing pipeline (e.g. tokenization or disambiguation), which could then be further investigated. 4.1 Semantic Relatedness Testing on various semantic relatedness gives us results which are well below the state of the art as shown in Table 2. Part of this is due to the corpus used, which is smaller and more domain restricted than other corpora (e.g. Wikipedia, same weighing function used in GloVe and Swivel; word2vec uses a slightly different function d(n) = n/w. Table 2: Vecsigrafo Semantic Relatedness results compared to state of the art model WSsim WSrel simlex999 rarewords simverb SotA 2015 [12] 79.4 70.6 43.3 50.8 n/a Swivel [18] 74.8 61.6 40.3 48.3 62.8 SwivelUNv1.en 58.8 45.0 18.3 37.8 15.3 VecsigrafoUNv1.en 47.6 24.1 12.4 30.8 13.2 Gigaword): when we apply standard Swivel on the UN corpus we see a substan- tial decrease in performance. Furthermore, the lemmatization and inclusion of concepts in the vocabulary may introduce noise and warp the vector space neg- atively impacting the results for this specific task. Although these results are disappointing, we note that (i) the results only provide information about the quality of the lemma embeddings but not the concept embeddings; (ii) an easy way to improve these results is to train a Vec- sigrafo on a larger corpus. Also, most of the available datasets are only available for English and we could not find similar datasets for Spanish. 4.2 Word Prediction Plots To address the limitations of top-n neighbour, visual exploration and semantic relatedness approaches discussed above, we developed a method to generate plots that can give us an overview of how well the embeddings perform across the entire vocabulary. We call these word prediction plots. The idea is to simulate the original word2vec learning objective –namely predicting a focus word based on its context words (or vice versa)– while gath- ering information about elements in the vocabulary. To generate the plot, we need a test corpus, which we tokenize and disambiguate in the same manner as during vecsigrafo generation. Next, we iterate the disambiguated test corpus and for each token, we calculate the cosine similarity between the embedding of the focus token and the weighted average vector of the context vocabulary elements in the window. After iterating the test corpus we have, for each vocabulary ele- ment in the test corpus, a list of cosine similarity values for which we can derive statistics such as the average, standard deviation, minimum and maximum. The plots are configurable as you can vary: (i) the statistic to display, e.g. average, minimum; (ii) how to display vocabulary elements(e.g. vary order or colour); (iii) versions of generated embeddings; (iv) test corpora (e.g. Wikipedia, Europarlament); (v) the window size and dynamic context window weighting scheme (see Sect. 3) (vi) the size of the test corpus: the larger the corpus the slower it is to generate the cosine similarities, but the more lexical entries will be encountered. Plot generation takes, on a typical PC, a couple of minutes for a corpus of 10K lines and up to half an hour for a corpus of 5M lines. These types of plots are useful to verify the quality of embeddings and to explore hypotheses about the embedding space as we show in Figure 2. Fig. 2a shows a plot obtained by generating random embeddings for the English vocab- ulary; as can be expected the average cosine similarity is close to zero for the (a) Random embeddings (2M, 10) (b) Buggy correlations (5M, 10) (c) Uncentered (5M, 4) (d) Re-centered (10K, 5) Fig. 2: Example word prediction plots, number of sequences of the test corpus used and the context window size. The horizontal axis shows the rank of the vocabulary elements sorted from most to least frequent; the vertical axis shows the average cosine similarity (which can range from -1 to 1). 150K lexical entries in the vocabulary. Fig. 2b shows a plot for early embeddings we generated where we had a bug calculating the correlation matrix. Manual inspection of the embeddings seemed reasonable, but the plot clearly shows that only the most frequent 5 to 10K vocabulary elements are consistently correct; for most of the remaining vocabulary elements the predictions are not better than random (although some of the predictions were good). Fig. 2c shows results for a recent version of the Spanish vecsigrafo, which are clearly better than random; although overall values are rather low. Fig. 2d shows the plot for our current embeddings where the vector space is re-centered as explained next. 4.3 Vector Distribution and Calibration One of the things we noticed using the prediction plots was that, even after fixing bugs with the co-occurrence matrix, there seemed to be a bias against the most frequent vocabulary elements, as shown in fig. 2c, where it seems harder to predict the most frequent words based on their contexts. We formulated various hypotheses to try to understand why this was happening, which we investigated by generating further plots. A useful plot in this case was generated by cal- culating the average cosine similarity between each vocabulary element and a thousand randomly generated contexts. If the vector space is well distributed, we expected to see a plot similar to fig. 2a. However, the result depicted in figure 3a verifies the suspected bias by showing that given a random context, the vector space is more likely to predict an infrequent word rather than a frequent one. To avoid this bias, we can recalibrate the embedding space as follows: we calculate the centroid for all the vocabulary elements and then shift all the vectors so that the centroid becomes the origin of the space. When generating the random contexts plot again using this re-centered embedding space, we get the expected results. Figures 2c and 2d, show that this re-centering also improves the prediction of the most frequent lexical entries. (a) Original (uncentered) (b) Re-centered Fig. 3: Average cosine similarity prediction plots for random contexts. 5 Vecsigrafo Application: Cross-KG Concept Alignment As mentioned in Sect. 2, there are many tasks currently requiring manual effort, where a Vecsigrafo can be useful. In this section we discuss one such application: cross-KG alignment of concepts. Sensigrafos for different languages have been modelled by different teams and fit different strategic needs, hence they differ in terms of maturity and conceptual structure (besides the linguistic differences). This provides a use-case for semantic portability as, in order to support cross- linguality, we need to be able to map concepts between different Sensigrafos as accurately as possible. We describe how we apply vecsigrafos to accomplish this. Mapping Vector Spaces We followed Mikolov et al.[14] approach to generate embeddings for differ- Table 3: Alignment ent languages and then aligning the different spaces. method performance To do this we need a seed dictionary between the vocabularies, which we had in the form of a partial Method Nodes Hit@5 mapping (for 20K concepts) between the Spanish and TM n/a 0.36 English sensigrafos. We expanded the partial concept NN2 4K 0.61 mapping to generate a dictionary for lemmas (cover- NN2 5K 0.68 ing also around 20K lemmas). We split this dictionary NN2 10K 0.78 into a training, validation and test set (80-10-10) and NN3 5K 0.72 tried a couple of methods to derive an alignment be- tween the spaces, summarised in Table 3. Although Mikolov suggests using a simple linear translation matrix (TM) [14], we found this method had very poor results. This suggests the desired alignment between the embedding spaces is highly non-linear7 , which prompted us to use neural 7 We explored and verified this in a number of experiments not reported in this paper. networks (NN) with ReLU activations to capture these non-linearities. The best NN was able to include the correct translation for a given lexical entry in the top-5 of nearest neighbours in 78% of cases in our test set. In 90% of the cases we found that the top 5 suggested translations were indeed semantically close. The results indicate that the dictionary we used to train the alignment models covers the low ambiguity cases where straightforward ontology alignment methods can be applied. Combining embedding and KG infor- Table 4: Manual inspection of mation The results shown in Table 3 are very bilingual embeddings encouraging, hence we used them to generate a bi-lingual vecsigrafo. To check how well in dict out dict the bi-lingual vecsigrafo generalises from the dictionary, we took a sample of 100 English # concepts 46 64 concepts and analysed their top5 neighbours hit@5 0.72 0.28 in Spanish as shown in Table 4. The hit@5 no conceptes 2 33 for the in-dictionary is in line with our test results; but for the out-of-dictionary concepts, we could only manually find an exact synonym in 28% of the cases. Manual inspection showed that for over half of the concepts, the corresponding Spanish concept had not been included in the vecsigrafo, or there was no exact match in the Spanish Sensigrafo. Furthermore, as Table 1 shows, the Spanish sensigrafo has 75K fewer concepts than English and due to modelling and language differences, many concepts may be funda- mentally unmappable 8 . In conclusion: the bi-lingual vecsigrafo can help us find missing mappings, but still requires manual validation as it does not provide a solution to the underlying problem of finding exactly synonymous concepts. Our next step was to design a hybrid synonym concept suggester which combines features from the bi-lingual vecsigrafo, the information in the Sensi- grafos and PanLex [11], a multilingual thesaurus. In broad lines, the suggester works as follows: for a given concept in the source language, we find the n nearest concepts in the target language that match the grammar type (i.e. they should be either nouns, verbs, adjectives, etc.); next, for each candidate, we calculate a set of hybrid features such as likelihood of lemma translation, glossa similar- ity, absolute and relative cosine similarity, shared hypernyms and domains. We then combine the various features into a single score, which we use to re-order the candidates and decide whether we have found a possible exact synonym. Finally, we verify whether the suggested synonym candidate is already mapped to a different concept and if so, we calculate the same features for this pair and compare it to the score for the top candidate. The output of the synonym suggester for an input is either (i) no suggestion or (ii) a suggested synonym, which can be either clashing or non-clashing. Figure 4a shows the output suggestions for 1546 English concepts, used in a large rule- base for IPTC categorization, for which we did not have a Spanish mapping. We manually inspected 30 of these cases to verify the suggestions. The results, 8 The size and scope of the UN corpus limits the concepts available in Vecsigrafo. shown in Fig. 4, confirm that the suggestions are mostly accurate. In fact, for 5 of the clashing cases, the sugges- tion was better than the existing mapping; i.e. the existing mapping was not an exact synonym. For another 4 clash- ing suggestions, the existing mapping had very close mean- ings, indicating the concepts could be merged. In some cases suggestions were not exact synonyms, but pointed at modelling differences between the sensigrafos. For ex- (a) 1546 suggestions ample, for the English concept vote, a verb with glossa go to the polls, the suggested Spanish synonym was con- cept votar, a verb with glossa to express an opinion or preference, for example in an election or for a referendum, which is more generic than the original concept. However, the Spanish Sensigrafo does not contain such a specific concept (among 5 verb concepts associated votar) and the (b) 16 no suggestion English Sensigrafo does not contain an equivalent concept (among 6 verb concepts associated to vote, plus 9 non- verb concepts). 6 Conclusion and Future Work This paper introduced a method for generating joint word- concept embeddings based on word sense disambiguation (c) 10 clashing which can be combined with knowledge graphs into a Vec- sigrafo for providing hybrid capabilities that could not be achieved by either the KG or word-embeddings alone. We presented evaluation methods that we have used, intro- ducing a new kind of plot for assessing embedding spaces. Finally, we presented a practical application of Vecsigrafo showing promising results. We believe these methods can be employed to improve KGs and tools in the Semantic (d) 4 non-clashing Web and Computational Linguistics communities. As future work, we intend to parallelise the presented Fig. 4: Synonym pipelines for vecsigrafo and plot generation and apply suggestions for these to larger corpora. We are also extending a rule trans- 1546 syncons and lator [6] using Vecsigrafo to support new language com- manual inspection binations; previously, translations were only possible be- breakdown. tween fully-mapped, closely related languages (e.g. Italian to Spanish). We are designing Vecsigrafo-powered tools that can be used by our linguists to assist in Sensigrafo curation and alignment tasks. Acknowledgements This work is supported by CDTI (Spain) as project IDI-20160805 and by the European Comission under grant 700367 – DANTE – H2020-FCT-2014-2015/H2020-FCT-2015. References 1. Aroyo, L., Welty, C.: Truth is a lie: Crowd truth and the seven myths of human annotation. AI Magazine 36(1), 15–24 (2015) 2. Baroni, M., Dinu, G., Kruszewski, G.: Don’t count, predict! A systematic compar- ison of context-counting vs. context-predicting semantic vectors. In: ACL (2014) 3. Baroni, M., Lenci, A.: Distributional Memory: A General Framework for Corpus- Based Semantics. Computational Linguistics 36(4), 673–721 (2010) 4. Bordes, A., Usunier, N., Weston, J., Yakhnenko, O.: Translating Embeddings for Modeling Multi-Relational Data. Advances in NIPS 26, 2787–2795 (2013) 5. Chen, X., Liu, Z., Sun, M.: A Unified Model for Word Sense Representation and Disambiguation. In: EMNLP. pp. 1025–1035 (2014) 6. Denaux, R., Biosca, J., Gomez-Perez, J.M.: Framework for Supporting Multilingual Resource Development at Expert System. In: Meta-Forum. Lisbon (2016), http: //www.meta-net.eu/events/meta-forum-2016/slides/31_denaux.pdf 7. Duong, L., Kanayama, H., Ma, T., Bird, S., Cohn, T.: Learning Crosslingual Word Embeddings without Bilingual Corpora. In: EMNLP-2016. pp. 1285–1295 (2016) 8. Feng, J., Huang, M., Yang, Y., Zhu, X.: GAKE: Graph Aware Knowledge Embed- ding. In: COLING. pp. 641–651 (2016) 9. Gunning, D., Chaudhri, V.K., Clark, P.E., Barker, K., Chaw, S.Y., Greaves, M., Grosof, B., Leung, A., McDonald, D.D., Mishra, S., Others: Project Halo Up- date—Progress Toward Digital Aristotle. AI Magazine 31(3), 33–58 (2010) 10. Iacobacci, I., Pilehvar, M.T., Navigli, R.: SENSEMBED: Learning Sense Embed- dings for Word and Relational Similarity. In: 53rd ACL. pp. 95–105 (2015) 11. Kamholz, D., Pool, J., Colowick, S.M.: PanLex: Building a Resource for Panlingual Lexical Translation. In: LREC. pp. 3145–3150 (2014) 12. Levy, O., Goldberg, Y., Dagan, I.: Improving Distributional Similarity with Lessons Learned from Word Embeddings. Transactions of the ACL 3(0), 211–225 (2015) 13. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed Represen- tations of Words and Phrases and their Compositionality. In: NIPS (2013) 14. Mikolov, T., Le, Q.V., Sutskever, I.: Exploiting Similarities among Languages for Machine Translation. Tech. rep., Google Inc. (sep 2013) 15. Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word repre- sentation. In: EMNLP. vol. 14, pp. 1532–1543 (2014) 16. Ristoski, P., Paulheim, H.: RDF2Vec: RDF graph embeddings for data mining. In: International Semantic Web Conference. vol. 9981 LNCS, pp. 498–514 (2016) 17. Schnabel, T., Labutov, I., Mimno, D., Joachims, T.: Evaluation methods for un- supervised word embeddings. In: EMNLP. pp. 298–307. ACL (2015) 18. Shazeer, N., Doherty, R., Evans, C., Waterson, C.: Swivel: Improving Embeddings by Noticing What’s Missing. arXiv preprint (2016) 19. Smilkov, D., Brain, G., Thorat, N., Nicholson, C., Reif, E., Viégas, F.B., Wat- tenberg, M.: Embedding Projector: Interactive Visualization and Interpretation of Embeddings. In: Interpretable Machine Learning in Complex Systems (2016) 20. Wang, Z., Zhang, J., Feng, J., Chen, Z.: Knowledge Graph and Text Jointly Em- bedding. EMNLP 14, 1591–1601 (2014) 21. Ziemski, M., Junczys-Dowmunt, M., Pouliquen, B.: The united nations parallel corpus v1. 0. In: Language Resource and Evaluation (2016)