<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Word sense disambiguation of Arabic language with Word Embeddings as part of the Creation of a Historical Dictionary</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rim Laatar</string-name>
          <email>rimlaatar@yahoo.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chafik Aloulou</string-name>
          <email>chafik.aloulou@fsegs.rnu.tn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lamia Hadrich Bilguith</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ANLP Research Group</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>A historical dictionary is a dictionary which deals with the detailed history of words since their first appearance in language as well as the evolution of their meaning and use throughout history. To create such a dictionary, we are bound to follow many steps. As part of these steps, the extracting of the appropriate meaning of a given word occurring in a given context, named also word sense disambiguation (WSD). This article proposes a word embedding based method to solve the problem of WSD. The main idea is to exploit vectors as word representations in a multidimensional space in order to capture the semantic and syntactic properties of words. The experiments show that the proposed system achieved an accuracy of 78%.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Linguists state that the major goal of the historical dictionary of Arabic resides in
encompassing the entire Arabic lexicon throughout its history by presenting every
Arabic word in its morphological, semantic and contextual development from its first
appearance in written texts to the present. It shows definitions in an order that the
meaning of the word being used allows the reader to get an approximate meaning of
the time period in which a particular word has been in use.</p>
      <p>The historical dictionary of Arabic is very significant because it not only remedies
the present lack of an all-encompassing, historical dictionary for Arabic speakers, but
also serves to preserve the Arab nation‟s common linguistic and intellectual legacy.
Hence, the chief goal of the dictionary is to safeguard the riches of Arabic cultural
heritage.</p>
      <p>
        According to [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], we assume that the creation of a historical dictionary can benefit
from automatic processing tools such as semantic analysis.
      </p>
      <p>One of the possible steps to create a historical dictionary is by extracting the
appropriate sense of a given word occurring in a given context and by recording the
transformation of each word‟s meaning.</p>
      <p>To our knowledge, it appears that there is no research addressing the issue of
disambiguating Arabic words to create a historical dictionary of the Arabic language. In
fact, WSD is the problem of identifying the meaning of a word within a specific
context.</p>
      <p>
        In this work, we will present a workable method for Arabic WSD based on word
embedding. More particularly, the proposed system uses the Arabic dictionary to
select word senses. Then, the sense attributed to an ambiguous word is the one that
possesses the closest semantic proximity to the local context. Our method consists of
first training word vectors from the corpus using Mikolov‟s Skip-Gram model [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ],
followed by representing the context of a word to be disambiguated and all the senses
as a vector in a multidimensional space. Then, WSD is done by simple calculation of
cosine similarity as a metric for comparing the similarity of the context vector with
the target word sense vectors, the sense of the highest similarity being allocated as the
disambiguated sense.
      </p>
      <p>The rest of this article is organized as follows: the second section describes the
main used approaches for WSD, the third one presents our proposed WSD method
based on word embedding and the fourth one describes the experimental results of
this study. Finally, our conclusion and some future works are drawn in Section five.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Main used approaches</title>
      <p>WSD is a fundamental task in Natural Language Processing (NLP). The aim of WSD
is to assign the correct meaning or the sense of a word in a given context. There are
three main approaches to WSD: knowledge based approach, supervised approach and
unsupervised approach.
2.1</p>
      <sec id="sec-2-1">
        <title>Knowledge based approach</title>
        <p>
          Knowledge based approaches are based on different knowledge sources as
dictionaries, thesauruses and lexicons. This technique is applied to make use of one or more
sources of knowledge to associate the most appropriate senses with words in context.
Some of them are based on the calculation of the word overlap between the sense
definitions of two or more target words [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. This approach is named gloss overlap or
the Lesk algorithm [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Yet some others have exploited a number of measures of
semantic similarity based on the network of semantic connections between word senses
in Wordnet.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Supervised based approach</title>
        <p>
          They used an annotated training corpus for inducing a classifier from manually
senseannotated data sets. Usually, the classifier is concerned with a single word and
performs a classification task in order to assign the appropriate sense to each instance of
that word [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. For the supervised methods, we can cite: the decision lists, neural
networks and naive bayes algorithm.
2.3
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Unsupervised approach</title>
        <p>
          These methods are based on unlabeled corpora and do not exploit any manually
sense-tagged corpus to provide a sense choice for a word in context. These
approaches to WSD hinge upon the idea that the same sense of a word has similar neighboring
words. They are able to induce word senses from input text by clustering word
occurrences and then classifying new occurrences into the induced clusters [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. They are
divided into methods based on context clustering, word clustering and co-occurrence
graphs. The first one represents each occurrence of a target word in a corpus as a
context vector. Then, the vectors are clustered into groups, each identifying a sense of
the target word. The second one cluster words which are semantically similar and can
thus convey a specific meaning.
        </p>
        <p>
          Finally, the last one, that is, methods which aim at building a graph G=(V,E),
whose vertices V correspond to words in a text and edges E, connect pairs of words
which co-occur in a syntactic relation, in the same paragraph, or in a larger context
[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Word embedding</title>
      <p>Recently, a lot of work has been done to represent individual words of a language
as vector in a multidimensional space that conveys the semantic information
contained in the words. Thanks to their ability in efficiently learning the semantics of
words, these representations can serve as a fundamental unit to a wide range of
Natural Language Processing. More particularly, it shows that using word vector is
effective for the WSD.</p>
      <p>
        In the past few years, much progress has been made on using neural networks to
represent words in vector space [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        Mikolov et al [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], proposed two new methods for building word representation in
vector space using continuous bag of word (CBOW) and Skip-Gram models. These
methods are based on neural network architecture.
      </p>
      <p>
        CBOW predicts a pivot word using a window of contextual words around the pivot
from the same sentence. The objective of this network architecture is to classify
correctly the pivot word given to its context by using log linear classifiers [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. On the
other hand, Skip-gram models aims at training a network that predicts the likelihood
of context words occurring in a given center word.
      </p>
      <p>Most of the works that exploit word representations in vector space in word sense
disambiguation were applied to English. However, to our knowledge, no previous
work has investigated any method of representing words as vectors in Arabic word
sense disambiguation.</p>
      <p>
        A number of different approaches addressing the problem of word sense
disambiguation based on representing words as vector in a multidimensional space have been
proposed in the past few years. These are some examples:
 [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] Proposed a method to solve word sense disambiguation based on neural
models. They particularly build an embedding of context by concatenating or weighting
a sum of the embeddings of the words surrounding the target word. Then, sense
embeddings are computed as weighted sums of the embeddings of words in the
WordNet gloss for each sense.
 The method presented by [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] is a supervised learning method for word sense
disambiguation based on word vector embedding of [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].The authors have shown that
word embedding can be used as additional features in a supervised WSD system.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Proposed method</title>
      <p>As noted earlier, in recent years, the idea of embedding words in a vector space
using neural network-inspired algorithms have had significant successes in numerous
NLP tasks mainly owing to their ability to capture semantic information from massive
amounts of textual content. That is why word sense disambiguation has become even
more prominent with the advent of neural networks as it can be efficiently solved
using this method. Word embedding provided an efficient affordable method of
finding similarity between different words and building semantic vector space. They
require no manual annotation, only large corpora of texts, thus any set of texts can be
used as a corpus.</p>
      <p>Here, we propose to define our method for Arabic word sense disambiguation
based on words embedding.</p>
      <p>
        The first step of the proposed method is to train Arabic corpus. For our training
corpus, we opted for Historical Arabic Dictionary Corpus [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] which is originally
designed to build a historical dictionary. The dataset comprises around 86 millions
words. Then, we use the Skip-gram model [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], a neural network based language
model, to learn word vectors.
      </p>
      <p>
        After learning the word vectors using Skip-Gram model, the second step of the
proposed method is to assign vector representations for the context of use containing
the ambiguous word and its senses based on their definitions (glosses extracted from
dictionaries). Subsequently, we generate context vector and sense vectors. Our
strategy of generating context vector, which is inspired by [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], consists in summing the
vectors of the words surrounding a target word (we skip a word if the word is not a
content word). Similar to the generating of context vector, we use the sum of all of the
content word vectors in every sense definition of the ambiguous word as the
generation of vectors of senses.
      </p>
      <p>The last step of our proposed method is to measure the similarity between the
different glosses of the ambiguous word and the current context by computing the cosine
similarities between the context vector and the sense vectors of the ambiguous word.
Then, we choose the sense that yields the maximum cosine similarity as an
appropriate sense for the ambiguous word.</p>
      <p>
        Figure 1 below describes the principle of this method.
We use Skip-gram to train the word vectors from large amounts of text data. We
choose Skip-gram for its simplicity and effectiveness. The training objective of
Skipgram is to predict the surrounding words given the current word [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        To train skip-gram model, we can use a large amount of raw Arabic texts from the
Historical Arabic Dictionary Corpus [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. This corpus is originally designed to build a
historical dictionary, it contains texts in classical Arabic and modern standard Arabic
from the 2nd up to the 21st century. There are several types of texts which can be
summarized as Poetry, Quran, literary prose, Hadiths, history and genealogy,
religions and doctrines, encyclopedias and dictionaries, journalistic texts, geography and
travel literature.
      </p>
      <p>We performed several cleaning and normalization steps to the corpus such as:
─ Removing from each document in the Arabic dataset punctuation marks and
diacritics.
─ Normalizing the letters (آ,إ,أ) to (ا)</p>
      <p>The vocabulary size of the compiled corpus comprises more than 86 millions of
words. Training skip-gram model requires a choice of some parameters affecting the
resulting vectors.
─ Vector size: dimensionality of the word vectors.
─ Window: the amount of context to consider around the pivot word.
─ Sample: threshold for sub-sampling of frequent words.
─ Negative: number of negative examples in the training.
─ Frequency threshold: words appearing with frequency less than this threshold will
be discarded.</p>
      <sec id="sec-4-1">
        <title>Sense and context representation</title>
        <p>After learning the word vectors using the Skip-gram model, we use the content
word‟s vectors in a sentence as the initialization vector of context. Subsequently we
eliminate stop words from the original sentence, using a predefined list of stop words.
In fact, stop words are eliminated as they have little semantic discrimination power in
our calculation. Let S1 = wn-k,……,…….,wn+k, be a window of text surrounding a
focus word wn, using a window size of 3 words (three words on the left and three
words on the right of the ambiguous word), an embedding for the context is computed
as a concatenation sum of the embeddings of the words wi.</p>
        <p>In order to represent the sense definition of the ambiguous word as a vector, we
initialize the sense vectors based on the glosses of senses. In fact, to extract glosses of
the ambiguous word we use Al-mu'jam al- wasit dictionary.</p>
        <p>Therefore, sense vector is represented by concatenating a sum of the vectors of
content words in the gloss.
4.3</p>
      </sec>
      <sec id="sec-4-2">
        <title>Similarity using cosine distance</title>
        <p>This last step consists in attributing for each ambiguous word its appropriate sense.
This is done by choosing the sense with the closest semantic proximity to its context
of use.</p>
        <p>The degree of similarity between a sentence (containing an ambiguous word) and
its sense definition is obtained by calculating cosine similarity between context vector
and sense vector.</p>
        <p>The sense definition that obtains the highest score of similarity with the current
context will represent the most probable sense of the ambiguous word.</p>
        <p>For example:</p>
        <p>Let W = 'جراٍسل'ا(vehicle) be an ambiguous word and let S be the context of use of
W:</p>
        <p>S = ىا يٌرفاسولا يه جراولا يا جراٍسلا ضعت َطقتلٌ ةجلا حتاٍغ ًف ٍىقلاو فسىٌ اىلتقت لا نهٌه لئاق لاق
يٍلعاف نتٌك</p>
        <p>(One of them said, „Kill not Joseph, but if you must do something, cast him into
the bottom of a deep well; some of the travelers will pick him up.‟)</p>
        <p>We give in what follows a set of glosses for the word W = „جراٍسل'ا given by the
dictionary Al-mu'jam al- wasit:</p>
        <p>First gloss:
حلفاقلا يا جراٍسلا تقلطً(اThe convoy set out)
Second gloss:</p>
        <p>لقٌلاو بىكرلل مذختست يٌسٌثلل كرحوت رٍست حٍلا حثكره(A vehicle with an engine used for
transport)</p>
        <p>The similarity between S and each sense definition is obtained as follows:
 Step1: context embedding.</p>
        <p>We use a window size of 3 words (including the ambiguous word), and we
represent the context in which the word occurs as a vector by summing word vectors.</p>
        <p>V(S) = V(ةجلا) + V(َطقتلٌ) + V(ضعت) + V(جراٍسلا) + V(جراول)ا + V(يٌرفاسولا) + V(يٍلعاف )
 Step2: sense embedding.</p>
        <p>V1 = V (تقلطًا ) + V(جراٍسلا) + V(حلفاقلا)</p>
        <p>V2 = V (حثكره) + V(حٍلا) + V(رٍست) + V(كرحوت ) + V(يٌسٌثلل) + V(مذختست) + V(بىكرلل) +
V(لقٌلاو )
 Step3: Calculate the similarity.</p>
        <p>Sim(S, V1) = cos(V(S), V1)</p>
        <p>Sim(S,V2) = cos(V(S), V2)
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experiments and results</title>
      <p>In order to measure effectively the performance of the proposed method, a large
collection is necessary. In fact, the English works were evaluated using Senseval-1 or
senseval-2. However, in our work we have to make own experimental data using a
totally different set of resources.</p>
      <p>In our experiments, we have used a test corpus containing 172 texts. From this
corpus, we extracted the use contexts (examples) of each word to be disambiguated. The
selection of sense for a target word was made from a list of senses given by
AlmuJam-Alwasit dictionary.</p>
      <p>In our work, ten words were chosen. For each one of these ambiguous words we
evaluated 50 examples.</p>
      <p>We used the Word2vec toolkit1 to learn 300 dimensional vectors. We chose the
Skip-gram architecture with the negative sampling set to 10 and the window size to
10 words.</p>
      <p>To measure the rate of disambiguation, we must use the most common evaluation
techniques which select a small sample of words and compare the results of the
system with a human judge. The precision measurement was used here. Experiment
results have shown a precision of 78% for texts in Modern Standard Arabic.</p>
      <p>In the table 2 bellow, we present the ten ambiguous words used in this paper and
we report the statistics of the obtained precision.
1 code.google.com/archive/p/word2vec/
يٍع
حٌا
نئاق
اًٍد
جرئاط
عهاج
ًهارح</p>
      <p>We can deduce from the above table that the average baseline precision is equal to
78%.</p>
      <p>According to table 2, we can note that the weakest precision is obtained by the
ambiguous word "يٍع"(eye). This can be explained by the fact that this word has some
rarely occurring meanings which also did not frequently occur in the corpus and
which are difficult to represent due to the lack of sufficient training examples.</p>
      <p>During the disambiguation process, we have encountered the following problems:
─ For some considered words, we have found out that there are some meanings
which appear in the corpus but do not exist in the dictionary. For example, for the
word “باثلا” (door), we have extracted a dozen of sentences from the corpus where
it stands for the name of a city in Syria. A sample of that is stated in the following:
فٌر ًف حٍهلاسلاا حلوذلا نٍظٌت لقاعه زرتأ باثلا حٌٌذوت حٌىٍح عقاىه ىلع رحلا يرىسلا شٍجلا رطٍس
ًقرشلا ةلح
(The Free Syrian Army took control of vital sites in the city of Al-Bab, the most
important stronghold of the Islamic state in Aleppo‟s eastern countryside).
─ If a sense from the Arabic dictionary has insufficient number of words in its gloss,
the vector of that sense is inaccurate.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion and future works</title>
      <p>This work explores the possibility of using word embedding to solve word sense
disambiguation problem. The proposed method consists of measuring semantic
relation between the context of use of the ambiguous word and its senses definitions. This
method carried out by training word vectors from the corpus using Mikolov‟s
SkipGram model and by representing the context of a word to be disambiguated and all
the senses as a vector in a multidimensional space. We apply cosine similarity to
compare the similarity of the context vector with the target word sense vectors, the
sense of the highest similarity being allocated as the disambiguated sense.</p>
      <p>For a sample of 10 ambiguous Arabic words, experiments have shown a precision
of 78%.</p>
      <p>We propose that in the future works we can train our model on a larger corpus (by
integrating others texts). We also propose to test our model on a larger test corpus (by
adding other contexts of use for each ambiguous word) and try to integrate IDF
weighing and Part-Of-Speech tagging on the context of use and senses definition in
order to support the identification of words that are hightly descriptive in the
examined context of use.
7</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Al-Said</surname>
            ,
            <given-names>A. B.</given-names>
          </string-name>
          :
          <article-title>Computerizing Historical Arabic Dictionary, al-</article-title>
          <string-name>
            <surname>Lisan</surname>
          </string-name>
          al
          <article-title>-arabi journal</article-title>
          , alRibat., vol.
          <volume>74</volume>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Al-Said</surname>
            ,
            <given-names>A. B.</given-names>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Medea-‐García</surname>
          </string-name>
          .:
          <article-title>The Historical Arabic Dictionary Corpus and its Suitability for a Grammaticalization Approach</article-title>
          , 5th international conference in linguistics,
          <source>Gramatyka i korpus - Grammar and Corpora</source>
          , http://www.iszip.uw.edu.pl, Institute of Western and Southern Slavic Studies, University of Warsaw, Poland., (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G. S.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          .
          <source>In Advances in Neural Information Processing Systems</source>
          , (
          <year>2013b</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Navigli</surname>
          </string-name>
          , R.:
          <article-title>Word sense disambiguation: a survey, ACM Computing Surveys (</article-title>
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Lesk</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone</article-title>
          .
          <source>The 5th annual international conference on systems documentation (</source>
          <year>1986</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Levy</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Goldberg</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Neural word embedding as implicit matrix factorization</article-title>
          .
          <source>In Advances in Neural Information Processing Systems</source>
          , pages
          <fpage>2177</fpage>
          -
          <lpage>2185</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Zahran</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Magooda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Mahgoub</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Hazem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Rashwan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            , and
            <surname>Atyia</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>Word representations in vector space and their applications for Arabic (</article-title>
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>A unified model for word sense representation and disambiguation</article-title>
          .
          <source>In EMNLP</source>
          , pages
          <fpage>1025</fpage>
          -
          <lpage>1035</lpage>
          , (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Taghipour</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          and Hwee Tou Ng.:
          <article-title>Semi-Supervised Word Sense Disambiguation Using Word Embeddings in General and Specific Domains</article-title>
          .
          <source>In Proceedings of the 2015 Annual Conference of the NAACL</source>
          , pages
          <fpage>314</fpage>
          -
          <lpage>323</lpage>
          , Denver, Colorado, (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Collobert</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Weston</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>A unified architecture for natural language processing: Deep neural networks with multitask learning</article-title>
          .
          <source>In Proceedings of the 25th ICML</source>
          , pages
          <fpage>160</fpage>
          -
          <lpage>167</lpage>
          , Helsinki, Finland, (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Nagoudi</surname>
            , E.,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            , and
            <surname>Schwab</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          :
          <article-title>Semantic Similarity of Arabic Sentences with Word Embeddings</article-title>
          ,
          <source>The Third Arabic Natural Language Processing Workshop</source>
          , (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>