Contextual Representations and Semi-Supervised Named Entity Recognition for Portuguese Language Pedro Vitor Quinta de Castro1 , Nádia Félix Felipe da Silva1 , and Anderson da Silva Soares1 1 Universidade Federal de Goiás, Goiânia GO 74690-900, Brazil I.pedrovitorquinta@inf.ufg.br, II.nadia@inf.ufg.br, III.anderson@inf.ufg.br Abstract. Named Entity Recognition is a Natural Language Processing task which is difficult to adapt across different domains. In this work, we propose a Semi-Supervised approach using Deep Learning models in order to support three different domains for the Portuguese language: general, police and medical. We perform the self-training of a model with an architecture based on a Bidirectional Long Short-Term Memory network with a Conditional Random Fields sequential classifier, using five Portuguese corpora. The word representations of the proposed model are contextual and provided by ELMo’s language model. The results achieve a competitive performance in the IberLEF evaluation forum. Keywords: Natural Language Processing · Named Entity Recognition · Deep Learning · Neural Networks · Portuguese Language 1 Introduction Information Extraction (IE) is the process of obtaining structured data from sources which can not be interpreted directly by machines, like texts [23]. This is particularly important considering the amount of textual information which is exchanged every minute on the internet [34]. Named Entity Recognition (NER) is the Natural Language Processing (NLP) task which focus on identifying and classifying named entities from this unstructured textual information, making them interpretable and accessible to different communication channels. When dealing with multiple domains, a NER prediction model needs to be able to handle not only the difference of lexicon between them, but also the difference of morphological features. This adds an additional layer of complexity to this task, requiring a more scalable model to perform well in this challenge. This paper describes our participation in IberLEF (Iberian Languages Eval- uation Forum), Task 1: Named Entity Recognition [31]. We present a system Copyright c 2019 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0). IberLEF 2019, 24 Septem- ber 2019, Bilbao, Spain. Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019) based on different deep learning architectures for both NER model and word representations. We propose a semi-supervised training in order to deal with the different domains targeted by the evaluation. 2 Related Work The first Deep Learning architectures to be applied in NER models were based on CNNs [5, 32], and later on Recurring Neural Networks (RNN)[9, 11, 4, 17, 22]. The reason why Deep Learning models perform well on NLP tasks is because they learn latent features from words, as well as the interactions between them, during the training of specific tasks, such as NER. Collobert et al. [5] proposed a model based on a Multilayer Perceptron with a convolutional layer, and the following works for NER were mostly based on bidirectional LSTMs, with a few differences between them. Huang et al. [11] used a biLSTM-CRF network with manually selected features, combined with features from SENNA [5] word embeddings. Chiu and Nicols [4] used a biLSTM model without the CRF layer for classification, and had their best results with character level features extracted from a CNN layer, concatenated with SENNA embeddings. Lample et al. [17] and Ma and Hovy [22] used similar approaches based on biLSTM-CRF models, with the difference that [17] used a biLSTM to extract character level features, combined with Word2Vec [24] representa- tions, while [22] used a CNN to extract the character level features, that were combined with GloVe [29] embeddings. These works show that biLSTM-CRF networks became a standard architecture for NER models (as well as for other NLP sequential classification tasks). Following works focused on representation of the words, instead of the actual NER model. Language models have been the primary architecture for contextualized word representations. Peters et al. [30], Devlin et al. [7] and Akbik et al. [1] developed different architectures for contextual word representations based on bidirectional lan- guage models and evaluated their performance on the NER task (as well as on other NLP tasks). Both papers, [30] and [1] used a biLSTM-CRF baseline NER model for evaluating their representation models, while [7] evaluated his model by adding a neural layer to the language model, performing the NER classification with it. The ELMo (Embeddings from Language Model) represen- tations from [30] are provided by the biLM language model, which is based on 2 biLSTM networks, with 2 layers each, and the model’s input is a character level representation provided by a CNN network. In another way, [7] created BERT, a language model based on the Transformer [36] architecture, which is based only on the neural mechanism of attention. The author from [1] created a language model on character level, in a way that his objective was not to predict words, but characters. The architecture of his CharLM model is also based on a biLSTM network. Table 1 lists the models presented on this section with their respective F-Score performance on the English benchmark from CoNLL-2003 [35]. For Portuguese language, the first work that used a Deep Learning approach was from Dos Santos and Guimarães [32], who adapted the architecture from [5] 412 Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019) Work Benchmark F-Score Year Akbik et al. [1] CoNLL-2003 93.09% 2018 Devlin et al. (BERT Large) [7] CoNLL-2003 92.80% 2018 Devlin et al. (BERT Base) [7] CoNLL-2003 92.40% 2018 Peters et al. [30] CoNLL-2003 92.22% 2018 HAREM-Sel 76.27% Quinta de Castro et al. [3] 2018 HAREM-Tot 70.33% Da Costa e Paetzold [6] HAREM-Tot 69.14% 2018 Chiu and Nichols [4] CoNLL-2003 91.62% 2016 Ma and Hovy [22] CoNLL-2003 91.21% 2016 Lample et al. [17] CoNLL-2003 90.94% 2016 Huang et al. [11] CoNLL-2003 90.10% 2015 HAREM-Sel 71.23% Dos Santos e Guimarães [32] 2018 HAREM-Tot 65.41% Collobert et al. [5] CoNLL-2003 89.59% 2011 Table 1. NER models using Deep Learning architectures for English and Portuguese languages, both evaluated using the CoNLL script [35]. The English language results are reported on the CoNLL-2003 [35] benchmark, and the Portuguese ones are reported on the HAREM [33] benchmark. and proposed CharWNN. For this work, besides using character level features from CNN, the authors also used word embeddings that were pre-trained using the Word2Vec tool [38]. Da Costa and Paetzold [6] and Quinta de Castro et al. [3] used a BiLSTM-CRF architecture with minor differences between them. [6] concatenated character level features from a BiLSTM network with FastText [13] word embeddings, prior to passing this concatenation through another BiLSTM network. [3] used a similar approach from [17] and concatenated the character level features from a BiLSTM network with the representations of a second BiLSTM, which processed pre-trained Wang2Vec [20] embeddings. 3 Proposed Model In this work, we propose a system based on different deep learning architectures, similar to that was used by [30]: a Bidirectional Long Short-Term Memory (BiL- STM)[10] NER model with a Conditional Random Fields (CRF)[16] sequential classifer; fed by the contextual word representations from an ELMo [30] language model, combined with character level representations from a Convolutional Neu- ral Network (CNN) [8, 18]. Our system differs from [30] in the way that we do not use pre-trained word embeddings, and we use two different ELMo models, one for the general domain of Portuguese language, and one for the police domain. The ELMo embeddings are obtained using the biLM (bidirectional Language Model) [30] architecture. This architecture is based on 2 BiLSTM networks, each of them responsible for one direction in the bidirectional language model: one for keeping a representation while making predictions in the forward direction of the text and one for the reverse direction. The first layer from the biLM model 413 Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019) produces character level features from the training words using two CNNs, one for each direction of the text, each of them with 2048 convolutional filters. They produce a representation with a total dimension of 4096, which is fed to the first BiLSTM layer of the biLM model. Each layer of the model (the CNN and the two BiLSTMs) projects the input it receives to a vector of dimension 1024. These 3 projections represent the ELMo embeddings which are produced by the biLM model. The size of the biLM training vocabulary determines the amount of words that will be predicted in the Softmax layer of the model, as shown in figure 1. 2048-dim ⟶ ⟵ (Each CNN) CNN CNN 1024-dim Projection 4096-dim ⟶ ⟵ (Each BiLSTM) BiLSTM BiLSTM 1024-dim Projection 4096-dim ⟶ ⟵ (Each BiLSTM) BiLSTM BiLSTM 1024-dim Projection Vocabulary Softmax for LM prediction Dimension Fig. 1. Layer representations of the biLM architecture and their connections between layers and projections. Note that the arrows → e ← in the LSTM layers indicate the direction of the objective function from the bidirectional language model, not the direc- tion of the LSTM networks, which are also bidirectional. Each 2-layer BiLSTM network used in this scheme works as a unidirectional language model, and their composition provides bidirectionality to the whole language model. The BiLSTM-CRF architecture used in this work is the same from the Al- lenNLP framework [2], following a parameterization similar to the one described in [30] for the NER task. The CNN network used for producing character level features from words used embeddings with dimension 16 and 128 convolutional filters of size 3, with the ReLU [12, 26] activation function. The BiLSTM network used for encoding the words has 2 layers, with 200 hidden units each. Figure 2 shows the dimensionality of the word representations obtained from the CNN and the 2 ELMo embeddings used. The 2 ELMo we use were trained in two separate domains: for the general Portuguese domain we used a Portuguese Wikipedia [37] dump, and for the police domain we used a 1.6 billion word corpus created from public documents from Brazil’s Labor Courts [15]. The Portuguese ELMo model we trained is publicly available at https://allennlp.org/elmo. For the Iber- LEF evaluation, we performed the fine tuning of this ELMo in this combined dataset, following [30]. 414 Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019) Dimension 128 Dimension 1024 Dimension 1024 Character Level ELMo ELMo Words Representations Dimension 200 Dimension 200 ⟶ ⟵ LSTM LSTM ⟶ ⟵ LSTM LSTM ⟶⟵ [ht, ht] Dimension 400 CRF Fig. 2. Representation of words in the proposed architecture 4 Experimental Setup and Results For the Portuguese NER task, IberLEF specified the evaluation of models in three different domains: general, police and clinical. For the specific domains only person names (PER category) are annotated, while the general domain dataset is annotated with 5 different categories: person, place (PLC), organization (ORG), value (VAL) and time (TME). The following public corpora were used for the model proposed in this work: WikiNER [27], LeNER-Br [21], HAREM I [33] and MiniHAREM [28] golden collections, and Paramopama [14]. We also used a private legal corpus provided by the Datalawyer company, consisting of 76 an- notated documents from the Brazilian Labor Court. The only dataset annotated with all five categories is HAREM. These corpora have the following categories annotated in them: – HAREM: Place, Organization, Person, Time, Value, Abstraction, Work, Event, Thing and Other; – LeNER-Br: Legal Case, Law, Place, Organization, Person and Time; – Paramopama: Place, Organization, Person and Time; – WikiNER: Place, Miscellaneous, Organization and Person; – Datalawyer: Function, Legal Basis, Place, Organization, Person, Court, Set- tlement Value, Pleed Value, Conviction Value, Court Costs and District. Since only the HAREM datasets contains all the categories needed for the IberLEF evaluation, we adopted a semi-supervised approach training for an ini- tial NER model to perform the self-training of the final model. This training had the following procedure: 415 Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019) 1. For each one of the datasets, we ignored all the entities that were not anno- tated as one of the 5 relevant categories for this evaluation. Their annotation was removed; 2. We merged the datasets from HAREM, LeNER-Br and Paramopama, and randomly split them into training, validation and test sets; 3. The resulting datasets from the previous step were used to train a NER model for bootstrapping Time and Value annotations for the datasets that didn’t contain these categories; 4. The bootstrap model was used to annotate: 4.1. Time and Value entities in the WikiNER dataset; 4.2. Value entities in the LeNER-Br dataset; 4.3. Value entities in the Paramopama dataset; 4.4. Time and Value entities in the Datalawyer dataset. 5. The resulting boostrapped corpora were merged and split into training, val- idation and test sets; 6. The resulting datasets from the previous step were used to train the final NER model that was submitted to the IberLEF evaluation. None of the existing annotations was removed or overriden during the boot- strapping of the datasets. Only words that prior to this process had no category associated to them were classified as either Time or Value, according to the bootstrap model. 4.1 Models Evaluation Prior to submitting the NER model with word representations from 2 ELMo and a CNN (henceforth referred to as 2xELMo+CNN), we performed the training of two other models, with different types of word representation: (i) ELMo+CNN and (ii) ELMo+CNN+Wang2Vec [19]. These two models use only the gen- eral domain ELMo. We performed the training of these three models using the same configuration, and performed an additional evaluation of them in the fol- lowing datasets: MiniHAREM, test datasets from Datalawyer Company and LeNER-Br, and the full datasets from Paramopama and WikiNER. For all of them, except MiniHAREM, we evaluated both variants: with and without boot- strapped Time and Value entities. The best model with the best F-Score was ELMo+CNN+Wang2Vec, followed by 2xELMo+CNN. We also evaluated the three models in all nine datasets (MiniHAREM, Data- lawyer, LeNER-Br, Paramopama and WikiNER, with these last four being evalu- ated in the original dataset, and the bootstrapped dataset). The 2xELMo+CNN had the best results for the MiniHAREM dataset, as well as for the datasets in the police domain (Datalawyer and LeNER-Br datasets). ELMo+CNN had the best results for Paramopama and WikiNER. After grouping these evaluation results by model, the best mean F-Score was from the 2xELMo+CNN variant. Since 2xELMo+CNN performed better in the police domain (which is relevant for the IberLEF evaluation), we chose this model for the task evaluation. 416 Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019) Table 2 presents the results obtained from the IberLEF evaluation. We point out that the only corpus we did not use from HAREM to train our models was the one from HAREM II [25], which is the one used in the general domain evaluation. We also did not have any access to any type of clinical documents or embeddings, so our model contained no type of adaptation for this specific domain. Corpus Category Precision Recall F-Score Police Dataset Person 86.14% 92.82% 89.35% Clinical Dataset Person 32.47% 51.02% 39.68% General Dataset Overall 63.11% 51.69% 56.83% (SIGARRA + HAREM II) Table 2. Results from the IberLEF evaluation, for the 3 different domains. 5 Concluding Remarks For the Portuguese NER task of the Iberian Languages Evaluation Forum, we experimented with different systems based on deep learning architectures, for both NER model and word representations. For the NER model we used the BiLSTM-CRF architecture, which became a reference for sequential classifica- tion NLP tasks. For word representations we experimented with character level features from Convolutional Neural Networks, Wang2Vec pre-trained word em- beddings, and the ELMo embeddings from a biLM language model. We evalu- ated different models with different types of word representations in 5 different corpora, and submitted a system based on 2 different ELMo, combined with character level features. Our model was trained in a semi-supervised scenario, in order to account for the lack of certain types of categories in the used corpora. Our main contribution is the use of ELMo embeddings for the Portuguese NER task, which have not been reported so far in the related literature. Our pre-trained ELMo model is publicly available at https://allennlp.org/elmo. For future work, instead of training a single NER model with different ELMo representations for different domains, we will experiment with an ensemble of different models, each one trained separately in a different domain. 6 Acknowledgements Thanks to Datalawyer (https://www.datalawyer.com.br/) for the financial sup- port and for providing the legal dataset used for training the submitted model. This work was developed in Deep Learning Brazil research group. Our researches are sponsored by Copel Energy Distribution, Data-H Artificial Intelligence, Cy- berLabs Artificial Intelligence, Americas Health and iFood Food Delivery. 417 Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019) References 1. Akbik, A., Blythe, D., Vollgraf, R.: Contextual string embeddings for sequence labeling. In: COLING 2018, 27th International Conference on Computational Lin- guistics. pp. 1638–1649 (2018) 2. AllenNLP: An open-source nlp research library, built on pytorch. (2018), https://allennlp.org/, [Online; accessed 06-July-2019] 3. Quinta de Castro, P.V., Félix Felipe da Silva, N., da Silva Soares, A.: Portuguese named entity recognition using lstm-crf. In: Villavicencio, A., Moreira, V., Abad, A., Caseli, H., Gamallo, P., Ramisch, C., Gonçalo Oliveira, H., Paetzold, G.H. (eds.) Computational Processing of the Portuguese Language. pp. 83–92. Springer International Publishing, Cham (2018) 4. Chiu, J.P., Nichols, E.: Named entity recognition with bidirectional LSTM-CNNs. Transactions of the Association for Computational Linguistics 4, 357–370 (Dec 2016), https://www.aclweb.org/anthology/Q16-1026 5. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493– 2537 (Nov 2011), http://dl.acm.org/citation.cfm?id=1953048.2078186 6. da Costa, P., Paetzold, G.H.: Effective sequence labeling with hybrid neural-crf models. In: Villavicencio, A., Moreira, V., Abad, A., Caseli, H., Gamallo, P., Ramisch, C., Gonçalo Oliveira, H., Paetzold, G.H. (eds.) Computational Process- ing of the Portuguese Language. pp. 490–498. Springer International Publishing (2018) 7. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidi- rectional transformers for language understanding. arXiv:1810.04805 (2018) 8. Fukushima, K.: Neocognitron: A self-organizing neural network model for a mech- anism of pattern recognition unaffected by shift in position. Biological Cybernetics 36(4), 193–202 (Apr 1980), https://doi.org/10.1007/BF00344251 9. Graves, A., rahman Mohamed, A., Hinton, G.E.: Speech recognition with deep recurrent neural networks. CoRR abs/1303.5778 (2013) 10. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (Nov 1997), http://dx.doi.org/10.1162/neco.1997.9.8.1735 11. Huang, Z., Xu, W., Yu, K.: Bidirectional lstm-crf models for sequence tagging. CoRR abs/1508.01991 (2015) 12. Jarrett, K., Kavukcuoglu, K., Ranzato, M., LeCun, Y.: What is the best multi-stage architecture for object recognition? In: 2009 IEEE 12th International Conference on Computer Vision. pp. 2146–2153 (2009) 13. Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. pp. 427–431. Association for Computational Linguistics, Valencia, Spain (Apr 2017) 14. Júnior, C.M., Macedo, H., Bispo, T., Santos, F., Silva, N., Barbosa, L.: Paramopama: a Brazilian-Portuguese Corpus for Named Entity Recognition. Tech. rep., Universidade Federal de Sergipe (2015) 15. de Justiça, C.N.: Processo judicial eletrônico (pje) (2019), http://www.cnj.jus.br/tecnologia-da-informacao/processo-judicial-eletronico- pje, [Online; accessed 06-July-2019] 16. Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: Proba- bilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning. pp. 282–289. ICML ’01, Morgan Kaufmann Publishers Inc. (2001) 418 Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019) 17. Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguis- tics: Human Language Technologies. pp. 260–270. Association for Computational Linguistics (2016) 18. Le Cun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Handwritten digit recognition with a back-propagation network. In: Proceedings of the 2Nd International Conference on Neural Information Process- ing Systems, pp. 396–404. NIPS’89, MIT Press, Cambridge, MA, USA (1989), http://dl.acm.org/citation.cfm?id=2969830.2969879 19. Ling, W., Dyer, C., Black, A., Trancoso, I.: Extension of the original word2vec using different architectures. url: https://github.com/wlin12/wang2vec 20. Ling, W., Dyer, C., Black, A.W., Trancoso, I.: Two/too simple adaptations of Word2Vec for syntax problems. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. pp. 1299–1304. Association for Computational Linguistics (2015) 21. Luz de Araujo, P.H., de Campos, T.E., de Oliveira, R.R.R., Stauffer, M., Couto, S., Bermejo, P.: Lener-br: a dataset for named entity recognition in brazilian legal text. In: International Conference on the Computational Processing of Portuguese (PROPOR). pp. 313–323. Lecture Notes on Computer Science (LNCS), Springer, Canela, RS, Brazil (September 24-26 2018) 22. Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional LSTM-CNNs- CRF. In: Proceedings of the 54th Annual Meeting of the Association for Com- putational Linguistics (Volume 1: Long Papers). pp. 1064–1074. Association for Computational Linguistics (2016) 23. Maynard, D., Bontcheva, K., Augenstein, I.: Natural language processing for the semantic web. Synthesis Lectures on the Semantic Web: Theory and Technology 6(2), 1–194 (2016) 24. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word repre- sentations in vector space. CoRR (2013), http://arxiv.org/abs/1301.3781 25. Mota, C., Santos, D. (eds.): Desafios na avaliação conjunta do reconheci- mento de entidades mencionadas: O Segundo HAREM. Linguateca (2008), http://www.linguateca.pt/LivroSegundoHAREM/, iSBN: 978-989-20-1656-6 26. Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann ma- chines. In: Proceedings of the 27th International Conference on International Con- ference on Machine Learning. pp. 807–814. ICML’10, Omnipress (2010) 27. Nothman, J., Ringland, N., Radford, W., Murphy, T., Curran, J.R.: Learning mul- tilingual named entity recognition from wikipedia. Artificial Intelligence 194, 151– 175 (2013) 28. Nuno Cardoso: Harem e miniharem: Uma análise comparativa (7 2006) 29. Pennington, J., Socher, R., Manning, C.: Glove: Global vectors for word represen- tation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). pp. 1532–1543. Association for Computational Linguistics (2014) 30. Peters, M., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettlemoyer, L.: Deep contextualized word representations. In: Proceedings of the 2018 Confer- ence of the North American Chapter of the Association for Computational Lin- guistics: Human Language Technologies, Volume 1 (Long Papers). pp. 2227–2237. Association for Computational Linguistics (Jun 2018) 419 Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019) 31. Sandra Collovini, Joaquim Santos, B.C.J.T.R.V.P.Q.M.S.D.B.C.R.G., Xavier, C.C.: Portuguese named entity recognition and relation extraction tasks at iberlef 2019 (2019) 32. dos Santos, C., Guimarães, V.: Boosting named entity recognition with neural character embeddings. In: Proceedings of the Fifth Named Entity Workshop. pp. 25–33. Association for Computational Linguistics, Beijing, China (Jul 2015) 33. Santos, D., Cardoso, N.: Reconhecimento de entidades mencionadas em português: Documentação e actas do HAREM, a primeira avaliação conjunta na área. Lin- guateca (November 2007), http://www.linguateca.pt/LivroHAREM/, iSBN: 978- 989-20-0731-1 34. Schultz, J.: How much data is created on the internet each day? url: https://blog.microfocus.com/how-much-data-is-created-on-the-internet- each-day/ 35. Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the conll-2003 shared task: Language-independent named entity recognition. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003 - Volume 4. pp. 142–147. CONLL ’03, Association for Computational Linguistics (2003) 36. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.u., Polosukhin, I.: Attention is all you need. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30, pp. 5998–6008. Curran Associates, Inc. (2017) 37. Wikipédia: Wikipédia - a free encyclopedia (2019), https://www.wikipedia.org/, [Online; accessed 06-July-2019] 38. Word2vec: Tool for computing continuous distributed representations of words. (2013), https://code.google.com/archive/p/word2vec/, [Online; accessed 06-July- 2019] 420