<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Linguistic Method into Stemming of Arabic A Linguistic MfeotrhoDdaintato CStoemmmprinesgsoiofnArabic for Data Compression</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hussein Soori</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jan Platos</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vaclav Snasel Hussein Soori</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jan Platos</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vaclav Snasel</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Electrical Engineering and Computer Science VFSaBcu-lTtyecohfnEicleacltrUicnailvEenrsgietnydoerfinOgsatrnadvCa</institution>
          ,
          <addr-line>omCzpeuctehr RSceipeuncbel,ic VseSnB.-sToeocrhin,icajl aUnn.ipvlerastitoyso,fvOacsltraavv.a,sCnazescehl</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2013</year>
      </pub-date>
      <fpage>119</fpage>
      <lpage>128</lpage>
      <abstract>
        <p>Creating good stemming rules for the Arabic language comes from the importance of Arabic language as the sixth most used language in the word. Stemming is very important in information retrieval, data mining and language processing. With Arabic having complex morphology and grammatical properties, this poses a challenge for researchers in this field. In this paper, we try to use an online morphological parser to distinguish parts of speech (POS), and then set some extracting rules to produce stems, and finally, mismatch these stems with an electronic dictionary. As a pilot study for this method, in this paper we deal with three POS: nouns, verbs and adjectives.</p>
      </abstract>
      <kwd-group>
        <kwd>Stanford Online Parser</kwd>
        <kwd>data compression for Arabic</kwd>
        <kwd>Arabic natural language processing</kwd>
        <kwd>Arabic data mining</kwd>
        <kwd>Arabic morphology</kwd>
        <kwd>stemming of Arabic</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The rapidly growing number of computer and Internet users in the Arab world and the
fact that the Arabic language is the sixth most used language in the world today
creates a demand for more research in the area of data mining and natural language
processing in Arabic language. Another two factors maybe that Arabic alphabet is the
second-most widely used alphabet around the world - Arabic script has been used and
adapted to such diverse languages as Amazigh (Berber), Hausa, and Mandinka (in
West Africa), Hebrew, Malay (Jawi in Malaysi and Indonesia), Persian, the Slavic
tongues (also known as Slavic languages), Spanish, Sudanese, and some other
languages, Swahili (in East Africa), Turkish, Urdu [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], and that Arabic is one of the six
languages used in the United Nations [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] after the Latin alphabet.
A few challenges may face researchers as for as the special nature of Arabic script is
concerned. Arabic is considered as one of the highly inflectional languages with
complex morphology. Unlike most other languages, it is written horizontally from right to
left. It consists of 28 main letters. The shape of each letter depends on its position in a
word—initial, medial, and final. There is a fourth form of the letter when written
alone. One example of this can be given for the letter (ع ) as follow:
Moreover, the letters alif, waw, and ya (standing for glottal stop, w, and y,
respectively) are used to represent the long vowels a, u, and i. This is very much different from
Roman alphabet which is naturally not linked. Other orthographic challenges can be
the the persistent and widespread variation in the spelling of letters such as hamza (ء)
and ta’ marbuTa ( ة ), as well as, the increasing lack of differentiation between
wordfinal ya ( ي ) and alif maqSura ( ى ). Typists often neglect to insert a space after
words that end with a non-connector letter such asو , ز , ر[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In addition to that,
Arabic has eight short vowels and diacritics (َ , , ٌ , ٍ , ً , ْ , ُ , ِ ).
Typists normally ignore putting them in a text, but in case of texts where typists do put
them, they are pre-normalized –in value- to avoid any mismatching with the
dictionary or corpus in light stemming. As a result, the letters in the decompressed text,
appear without these special diacritics.
      </p>
      <p>
        Diacritization has always been a problem for researches. According to Habash [12],
since diacritical problems in Arabic occur so infrequently, they are removed from the
text by most researchers. Other text recognition studies in Arabic include, Andrew
Gillies et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], John Trenkle et al. [30] and Maamouri et al. [20].
      </p>
      <p>Other than letters, another factor determain the word identity and in many instances
can change the meaning and part of speech. This factor is the eight short vowels and
diacritics (ِ , ِ , ُِ , ِْ , ًِ , ٍِ , ٌِ , ِ ). An example for (لجر ) is
given in the following table where we can see the total change in word category and
meaning as a result of adding the diactricals which resulted in producing three
different words in meaning and three different parts of speech for the same three letter لجر
:</p>
      <sec id="sec-1-1">
        <title>Word</title>
        <p>لُ جُر
ل جُر
ل جْر
ل جر</p>
      </sec>
      <sec id="sec-1-2">
        <title>Meaning</title>
        <p>man
man
foot
to go on foot (rather than,
e. g., ride a bike)</p>
      </sec>
      <sec id="sec-1-3">
        <title>Part of Speech</title>
        <p>
          noun (subject)
noun (object)
noun
verb
Never the less, it is always advised that these vowels and diacritics are often
normalized before processing in most light stemming or morphological approaches [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
Mainly the reasons for not including them in the word processing is the claim that
they do occur so infrequently, and that in Modern Standard Arabic (MSA), people
tend not to use them and, as a result of that, the meaning is left for the native
speaker’s intuition, or , in some cases, can be determined from the context. This problem is
still waiting for a challenging attempt where the processor is ready to process words
with or without diacritics, without needing to normalize words.
        </p>
        <p>Another morphological feature in Arabic is that, unlike Roman letters which are
separated naturally, Arabic has an agglutinated nature(as mentioned above) where letters
are linked to each other in some cases, while unlinked in some other case, depending
on position of the letter in the root, stem and word level. For example, in English the
pronoun (he) in (he plays) is separated from the following noun (plays), while in
Arabic the pronoun is represented by the letter (ي ) which is linked to the root verb بعل to
form بعلي (he plays). The same is true when it comes to different kinds of Affixes.
Arabic has four types of affixes. Prefixes: these are letters (normally one) that change
the tense of the verb from past to present, such as the letter (ي ) in case of the verb بعل
and بعلي above. Suffixes: these represent the inflectional terminations (endings) of
verbs, as well as, the female and dual/plural markers for the nouns. Postfixes: these
are the pronouns attached at the end of the word. Antefixes: these are prepositions
agglutinated to the beginning of words.
1.2</p>
        <sec id="sec-1-3-1">
          <title>The Problem at Hand:</title>
          <p>
            This paper is trying to improve the rules for stemming of Arabic texts for data
compression. A few different linguistic methods were used by us in the past, for example:
the vowel letter method [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ]. This method was mainly dependent on syllabification of
words and focused on splitting words according to vowel letters. The second approach
[
            <xref ref-type="bibr" rid="ref8">8</xref>
            ] was a simple approach into stemming rules, where 4 category of words were
selected (nouns, verbs, adjectives and adverbs) from short news item texts. These two
approaches produced some good results. However, two major problems showed up.
The first problem had to do with parts of speech (POS) recognition problem. For
example, the verb بعلي (plays) starts with the letter (ي ). In Arabic, adding the suffix (ي )
is a very common way to change the word from its past form into its present form.
When some rules are set to remove the letter (ي ) so to produce the root form of بعل ,
these rules always removed the letter (ي ) from other POS as well, such as the word
نم ي (Yemen) where the letter (ي ) is part of the root word .
          </p>
          <p>
            The second problem occurs within the sub-POSs when, for example, trying to remove
the determiner لا (the definite article 'the') from common nouns as in بلاطلا (the
student). The rules set remove the لا from all nouns including proper nouns such as,
ايناملا (Germany) where the لا is part of the original noun and not a determiner.
For these reasons, in this paper we try to use Stanford online [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ] to better categorize
the different POS and later to be mismatch the output words -after stemming- with an
elctronic dictionary.
1.3
          </p>
        </sec>
        <sec id="sec-1-3-2">
          <title>The Stanford Online Parser</title>
          <p>
            The Stanford parser is a powerful online parser that parses texts in three languages:
Arabic, Chinese and English. This parser is using dependency grammar. The Arabic
parts of the parser [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ]is depending on the Penn Treebank project that was launches in
2001 in the University of Pennsylvania and headed by Prof. Mohamed Maamouri.
According to this corpus documentation [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ], this corpus is designed for those who
study or use languages professionally or academically, as well as, for those who need
text corpora in their work. The Penn Arabic Treebank is particularly suitable for
language developers, computational linguists and computer scientists who are interested
in various aspects of natural language processing.
1.4
          </p>
        </sec>
        <sec id="sec-1-3-3">
          <title>The Arabic Alphabets Transliteration System</title>
          <p>In this study, we use a transliteration system for Arabic Alphabets so to enable
nonArabic speakers identify Arabic alphabets and to to understand the rules proposed. A
legend of Arabic Alphabets and their English transliterations is provided in Table 1.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Stemming Rules</title>
      <p>According to Stanford Online Parser for Arabic language, there are 27 different POSs.
In this paper, a number of rules are set for 3 main POSs: nouns, verbs and adjectives
as follows:
The rule for every POS or sub-POS is divided into steps as shown below. Every step
is to be implemented in the order of numbering:</p>
      <sec id="sec-2-1">
        <title>Specifications</title>
        <sec id="sec-2-1-1">
          <title>W – any word or its part (word referes to any POS in the rule: noun, verb, adjective, etc.) [] – arabic letter</title>
        </sec>
        <sec id="sec-2-1-2">
          <title>Ins(x, y) – return true when x is anywhere in y |x| - length of word x [x]W – letter x is at the beginning of the word</title>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Nouns Rules:</title>
        <p>a) DTNN: determiner + singular common noun</p>
        <sec id="sec-2-2-1">
          <title>Step 1: [alif laamAlif laamAlif]W -&gt; [alif laam]W</title>
        </sec>
        <sec id="sec-2-2-2">
          <title>Step 2: [alif laamAlif]Wxy -&gt; [alif laam]Wy</title>
          <p>b) DTNNP: determiner + singular proper noun</p>
        </sec>
        <sec id="sec-2-2-3">
          <title>Step 1: [alif laam]W -&gt; W</title>
          <p>c) DTNNS: determiner + plural common noun</p>
        </sec>
        <sec id="sec-2-2-4">
          <title>Step 1: [alif laam]W -&gt; W</title>
          <p>d) NNPS: common noun, plural or dual
Step 1: W[ta] -&gt; W</p>
        </sec>
        <sec id="sec-2-2-5">
          <title>W[yaa nuun] -&gt; W</title>
          <p>Step 2: |W| &lt; 5 -&gt; W[taMarboota]</p>
        </sec>
        <sec id="sec-2-2-6">
          <title>Step 3: W[waaw][taMarboota] -&gt; W[taMarboota]</title>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>Verbs Rules:</title>
        <p>a) VBD: perfect verb (***nb: perfect rather than past tense)
Step 1: |[waaw]W|&gt;2 -&gt; W
Step 2: W[alif] -&gt; W</p>
        <p>W[ta] -&gt; W</p>
        <sec id="sec-2-3-1">
          <title>W[waaw nuun] -&gt; W</title>
        </sec>
        <sec id="sec-2-3-2">
          <title>Step 3: W[alif haa] -&gt; W[alifMaqsoora]</title>
        </sec>
        <sec id="sec-2-3-3">
          <title>W[ta haa] -&gt; W[alifMaqsoora]</title>
          <p>b) VBN: passive verb (***nb: passive rather than past participle)
Step 1: [yaa]W -&gt; W
Step 2: |W| = 4 &amp; [ta]W -&gt; [alif]W
c) VBP: imperfect verb (***nb: imperfect rather than present tense)
Step 1: [ta]W -&gt; W
[ta ta]W -&gt; W
[yaa]W -&gt; W
Step 2: W[waaw] -&gt; W
Step 3: [nuun]W -&gt; W
[waaw nuun]W -&gt; W
[haa]W -&gt; W
[haa alif]W -&gt; W
Step 4: |W| = 2 -&gt; W[alifMaqsoora]</p>
        </sec>
        <sec id="sec-2-3-4">
          <title>Step 5: W[yaa] -&gt; [alif]W[alifMaqsoora]</title>
        </sec>
        <sec id="sec-2-3-5">
          <title>Step 6: [siin]W &amp; ins(W, [ta]) -&gt; [alif][siin]W</title>
        </sec>
        <sec id="sec-2-3-6">
          <title>Step 7: W[waaw laam] -&gt; W[alif laam]</title>
        </sec>
        <sec id="sec-2-3-7">
          <title>W[waaw laam waaw nuun] -&gt; W[alif laam]</title>
        </sec>
        <sec id="sec-2-3-8">
          <title>W[waaw nuun] -&gt; W[alif laam]</title>
          <p>Step 8: [nuun][ta]W &amp; |[nuun][ta]W| &gt; 3 -&gt; [nuun]W</p>
        </sec>
      </sec>
      <sec id="sec-2-4">
        <title>Adjectives Rules:</title>
        <p>a) DTJJ: determiner + adjective</p>
        <sec id="sec-2-4-1">
          <title>Step 1: [alif laam]W -&gt; W</title>
        </sec>
        <sec id="sec-2-4-2">
          <title>Step 2: W[taMarboota] -&gt; W</title>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experiments</title>
      <p>The suggested rules must be tested against real data. For this purpose, we use some
news articles, from the BBC Arabic and Al Jazeera Arabic news portals. These
articles are parsed by Stanford Online Parser and the results are shown in table 2. In the
following table, repeated words are deleted and sample words of every POS or
subPOS are shown in the table.
b) DTNNP: determiner + singular proper noun
c) DTNNS: determiner + plural common noun</p>
      <p>كيسكملا جاعلا ةحودلا ليزاربلا تنرتنلاا
تارشعلا تاكرشلا نويطارقوميدلا نويروهمجلا تايولولاا نييكيرملاا
تاجتنملا تاثولملا تلاماعملا نيرتشملا نوظفاحملا ةيامحلل قيقحتل
تايلاولا تاصنملا نوبودنملا
d) NNPS: common noun, plural or dual
تايمك تايلمع تارشع نينط تاقبط تاونس تاعامج تاريدقت تارييغت
تاعمتجم
تاصنم تاحرتقم</p>
      <p>Verbs
a) VBD: perfect verb (***nb: perfect rather than past tense)
انناكماب ديا
نوفيضيو تمدقو</p>
      <p>عرتقا تحبصا حبصا
دقو لاقو ديزتو ىضم
تجرختسا
ناك دقف
ىدا
تلعج
هترجا اهارجا
ابرست ببست
لوقيو
b) VBN: passive verb (***nb: passive rather than past participle)
جردي مدختست حجري بحست دجوي
c) VBP: imperfect verb (***nb: imperfect rather than present tense)
غلبي
لوقي
ددهي
لجست جرختست ضبرت يوتحت اهحيتت</p>
      <p>ودبي ددهت جتنت هظحلات دعت
اهجرخي ىئبتخي يوتحي نولواحي
عقي عرتقي حبصي لكشي نومسي
حجني يهتني رشتني نكمي لثمي</p>
      <p>فقوتت
وفطت اطت
اهلوانتي
ديفتسي
سرامي
عزوتت سفانتت ودبت
حبصت ريشت كراشت
متي قوفتي ضرعتي
قحتسي مهاسي لازي
نوكي داكي نولوقي</p>
      <sec id="sec-3-1">
        <title>Adjectives</title>
        <p>a) DTJJ: determiner + adjective
Before any rule is applied, all words must be normalized and preprocessed. We store
all words in plain text files using codepage 1256 – Arabic. Because all our software is
written in C+, we read these text files into Unicode representation.</p>
        <p>Our results for the nouns list are depicted in Tables 3, 4, 5 and 6. The results for the
noun rules produced very good results in case of DTNNP and DTNN. Very few
undesirable results were produced because some words were wrongly parsed by the
parser such as (بلاكلاب ). As for DTNNS, some more rules needed to deal with the
plural and dual suffixes. NNPS produced very good results.
The verbs' rules results are depicted in Tables 7, 8 and 9. The verbs' rules produced
good results in case of VBD and VBN. However, in case of VNP, a few bad results
show up and the rules have to be enhanced in the future.
ىرجا ىرجا ىدا جرختسا حبصا حبصا عرتقا ديا نناكماب ببست برست تلعج
دقف ناك ىضم ديزت لاق دق تمدق لوقي فيضي
The results for the adjectives’ rules are depicted In Table 10. Almost all rules made
for adjectives produced successful results.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>In this paper we set rules for POS and to parse our training data, we used Stanford
Online Parser for Arabic language, which identifies 27 different POSs. In this paper,
the rules set are for 3 main POSs: nouns, verbs and adjectives. Every rule for every
POS or sub-POS is divided into one or more steps.</p>
      <p>The results for the noun rules produced very good resuts in case of DTNNP and
DTNN. Very few undesirable results occur because some words were wrongly parsed
by the parser such as (بلاكلاب ). As for DTNNS, some more rules needed to deal
with the plural and dual suffixes. NNPS produced very good results. The verbs' rules
results are depicted in Tables 7, 8 and 9. The verbs' rules produced very good results
in case of VBD and VBN. However, in case of VNP, a few bad results show up and
the rules have to be enhanced in the future. The results for the adjectives's rules are
depicted In Table 10. Almost all rules made for adjectives produced very good results.
Most errors occurred in case of VBP. However, the overall evaluation of these rules
proved that the rules produced very good results. In the future, these rules must be
improved and enhanced to include more POSs and should be tested against wider
variety of vocabulary and bigger corpora.</p>
      <p>Acknowledgments: This work was partially supported by the Grant Agency of the
Czech Republic under grant no. P202/11/P142, SGS in VSB – Technical University
of Ostrava, Czech Republic, under the grant No. SP2013/70, and has been elaborated
in the framework of the IT4Innovations Centre of Excellence project, reg. no.
CZ.1.05/1.1.00/02.0070 supported by Operational Programme 'Research and
Development for Innovations' funded by Structural Funds of the European Union and state
budget of the Czech Republic and by the Bio-Inspired Methods: research,
development and knowledge transfer project, reg. no. CZ.1.07/2.3.00/20.0073 funded by
Operational Programme Education for Competitiveness, co-financed by ESF and state
budget of the Czech Republic.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Encyclopedia</given-names>
            <surname>Britannica</surname>
          </string-name>
          <article-title>Online</article-title>
          . Alphabet.
          <string-name>
            <surname>Online</surname>
          </string-name>
          (
          <year>2011</year>
          ). URL: http://www.britannica.com/EBchecked/topic/17212/alphabet
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>H.</given-names>
            <surname>Soori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Platos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Snasel</surname>
          </string-name>
          , H. Abdulla,
          <source>in Digital Information Processing and Communications, Communications in Computer and Information Science</source>
          , vol.
          <volume>188</volume>
          , ed. By V.
          <string-name>
            <surname>Snasel</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Platos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>El-Qawasmeh</surname>
          </string-name>
          (Springer Berlin Heidelberg,
          <year>2011</year>
          ), pp.
          <volume>97</volume>
          {
          <fpage>105</fpage>
          . URL http://dx.doi.
          <source>org/10.1007/978-3-642-22389-1 9. 10.1007/978-3-642-22389-1 9</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. T. Buckwalter, in Arabic Computational Morphology, Text,
          <source>Speech and Language Technology</source>
          , vol.
          <volume>38</volume>
          , ed. by
          <string-name>
            <given-names>N.</given-names>
            <surname>Ide</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Veronis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Soudi</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          v.d. Bosch, G. Neumann (Springer Netherlands,
          <year>2007</year>
          ), pp.
          <volume>23</volume>
          {
          <fpage>41</fpage>
          . URL http://dx.doi.
          <source>org/10.1007/978-1-4020-6046-5 3.10.1007/978-1-4020-6046-5 3</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>N.Y.</given-names>
            <surname>Habash</surname>
          </string-name>
          ,
          <source>Synthesis Lectures on Human Language Technologies</source>
          <volume>3</volume>
          (
          <issue>1</issue>
          ),
          <volume>1</volume>
          (
          <year>2010</year>
          ).
          <source>DOI 10</source>
          .2200/S00277ED1V01Y201008HLT010. URL http://www.morganclaypool.com/doi/abs/10.2200/S00277ED1V01Y201008HLT010 (last accessed 10/12/2012)
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>A.</given-names>
            <surname>Gillies</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Erl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Trenkle</surname>
          </string-name>
          , S. Schlosser,
          <source>in Proceedings of the Symposium on Document Image Understanding Technology</source>
          (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>J.</given-names>
            <surname>Trenkle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gilles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Eriandson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schlosser</surname>
          </string-name>
          , S. Cavin, in Symposium on Document Image Understanding Technology (
          <year>2001</year>
          ), pp.
          <volume>159</volume>
          {
          <fpage>168</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>M.</given-names>
            <surname>Maamouri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bies</surname>
          </string-name>
          , S. Kulick,
          <source>in Proceedings of the British Computer Society</source>
          Arabic NLP/MT Conference (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Soori</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Platoš</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Snášel</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Simple stemming rules for Arabic language</article-title>
          ,
          <source>Advances in Intelligent Systems and Computing</source>
          , Volume
          <volume>179</volume>
          AISC,
          <year>2012</year>
          , Pages
          <fpage>99</fpage>
          -
          <lpage>108</lpage>
          , ISBN:
          <fpage>978</fpage>
          -
          <lpage>364231602</lpage>
          -9
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Spence</given-names>
            <surname>Green</surname>
          </string-name>
          and
          <string-name>
            <given-names>Christopher D.</given-names>
            <surname>Manning</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Better Arabic Parsing: Baselines, valuations, and analysis</article-title>
          .
          <source>In 23rd Conference on Computational Linguistics</source>
          , pages
          <fpage>394</fpage>
          -
          <lpage>402</lpage>
          , Beijing, China.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>10. http://www.ircs.upenn.edu/arabic/Jan03release/README.txt (last accessed 10/03/2013)</mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>11. http://www.un.org/ (last accessed 10/03/2013)</mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>