<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Extraction and Separation of Words From bilingual printed document</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rabeb Ben Abdelbaki</string-name>
          <email>benabdelbakira@yahoo.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sofiene Haboubi</string-name>
          <email>sofiene.haboubi@istmt.rnu.tn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Arabic; Latin; Discrimination; morphology; Dilation; Structural</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>SIDOP Research Group Signal, Image and Information Technologies National Engineering School of Tunis BP 37 Belvedere Tunis</institution>
          ,
          <addr-line>TN-1002</addr-line>
          ,
          <country country="TN">Tunisia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>-in this paper, we present our work about the extraction and separation of words from bilingual printed document. This approach is based on the structuring element of the morphological dilation. We report results for Arabic, Latin and bilingual Arabic-Latin scripts and we show its limitations and present the possible improvements.</p>
      </abstract>
      <kwd-group>
        <kwd>-component</kwd>
        <kwd>Script</kwd>
        <kwd>Separation</kwd>
        <kwd>mathematical element</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>INTRODUCTION</p>
      <p>The character recognition system, called OCR “Optical
Character Recognition” allow to find the characters forming a
text, o recognize them individually and then validate them by
lexical recognition of words that contain them. In other words,
an OCR is the process of scanning a paper document which
leads to a digital text.</p>
      <p>Due to the oneness of the language script within the same
OCR, an important problem appears when the document is no
longer monolingual. In fact, if the document is multilingual,
then the OCR loses its ability to read the document because of
the dependence of characteristics on the structural properties of
the character, style and type of writing that generally differs
from a script to another. Therefore, it’s imperative to identify
the languages present in the document in order to redirect it to
the appropriate character recognizer.</p>
      <p>In reality, we can’t speak of discrimination between scripts
without involving document’s segmentation. In fact, the
segmentation of documents into words is an important step in
the process of document recognition; this phase becomes
crucial in the case of multilingual document. It is the
foundation of all the following steps; it increases also the
efficiency of a recognition system.</p>
      <p>Segmentation of text into words is occurred at the step of
discrimination between several scripts for the local approach
which is based on words as components to be studied; it
requires prior knowledge of individual words constituting the
document. It is also a necessary step to discriminate between
printed and handwriting document and for any system of
automatic processing of multilingual documents. This task is
usually a delicate and complex task in the multilingual context
given the large difference between the characteristics of
different scripts in the form of letters, spacing, etc... Our work
aims to develop a new method for the separation and extraction
of words in a bilingual printed document. We will focus in the
following on stating the characteristics of Arabic and Latin
scripts. And then mention some related works. After that, we
will present our method of separation and extraction of words
from bilingual printed documents. And finally we will interpret
the results of this work.</p>
      <p>II.</p>
      <p>MORPHOLOGICAL CHARACTERISTICS OF ARABIC AND</p>
      <p>LATIN SCRIPTS</p>
      <p>Script is the graphical presentation of a language through
signs drawn on a support. Since its appearance around the
third millennium BC, it has continued to grow with the
languages it represents. In this paper, we will focus on the
Latin and Arabic script.</p>
    </sec>
    <sec id="sec-2">
      <title>A. Arabic script</title>
      <p>The Arabic script is a consonantal script, composed of 28
letters, excluding the "hamza", which behaves either as a full
letter or as a diacritic and the symbol "~" which is
written only on the support of the character "ﺍ".</p>
      <p>The Arab character can have up to four different forms
depending on its position in the word or in the pseudo-word as
it changes its design depending on its position: initial, medial,
final or isolated.</p>
      <p>The Arabic script is a semi-cursive writing. Letters are
generally linked to each other and Arabic words can be
composed of one or more pseudo-words written from right to
left, both in printed or handwriting form.</p>
      <p>Several Arabic letters have the same body and differ only at
the number and location of diacritical marks. These diacritics
can be above or below the baseline, in different places
depending on the character, but never up and down
simultaneously. (Figure 1)</p>
      <p>In Arabic, there are 15 letters, presented in Table 1, among
28 of the alphabet, which have diacritical points.</p>
      <p>The Arabic script has no capital letters and Arabic
characters include a loop that can have different forms.</p>
      <p>In addition, Arabic script varies vertically and horizontally,
because of the presence of horizontal and vertical ligatures
between characters of the same word.</p>
      <p>The Arabic word doesn't have a fixed length, it may include
one or more pseudo-words called PAW (Piece of Arabic Word)
each including a different number of characters.</p>
      <p>The presence of pseudo-words in Arabic script increases
the complexity of its segmentation. In fact, these PAWS induce
to error segmentation's algorithms because they introduce
important and variable intra-word spaces length compared to
the intra-words space in Latin.</p>
      <p>Identify applicable sponsor/s here. If no sponsors, delete this text box.
(sponsors)</p>
      <p>The Latin script uses two bicameral spellings for each
character, one called lowercase, the other called uppercase or
capital. In general, each grapheme possesses these two types of
spelling with few exceptions changing from script to another.</p>
      <p>The Latin alphabet has 26 basic letters. In Uppercase form,
letters change shapes and sizes.</p>
      <p>Unlike the Arabic alphabet, Latin alphabet consists of two
types of graphemes, vowels and consonants. Latin characters
are composed of 5 vowels presented in Table 2 and 21
consonants listed in Table 3.</p>
      <p>The Latin alphabet is one of the richest alphabets of
national variations because of its geographic and temporal
spread. Each Latin script is based on the fundamental letters of
the Latin alphabet but it may have some specific letters
considered as variants from the basics ones and those
considered as new letters. Table 4 shows the different forms of
a Latin grapheme.
As for the Arabic script, several variants of Latin characters
have diacritical signs, such as points above the body of the
character, accents (acute accent, grave accent, circumflex),
tilde, etc. But only two basics characters have diacritical points.</p>
      <p>The Latin script is written from left to right, it's a
noncursive; its letters are isolated from one to another, separated
by intra-words spaces in its printed form. The Latin alphabet
has also many loops that can have different forms.</p>
      <p>The document’s words segmentation is an important phase
in the document’s recognition process. In fact, this phase is
very crucial in the case of multilingual documents; it becomes
obligatory to segment the document and identify its words
individually.</p>
      <p>This step is the foundation of all the following steps. The
segmented words become the entries for the other steps of the
recognition process. Despite the diversity of segmentation
approaches and its richness on segmentation methods, the
domain of document segmentation, especially in words, stills
an open field and a powerful line of research that interests
enough scientists. In fact, many researchers have focused on
this axis. Our literature review led to a list of research in this
area which we mention the most interesting.</p>
      <p>•
•
[Ma and Doermann, 03] used the Docstrum algorithm
of O'Gorman for the segmentation of bilingual
documents, applying it on Arabic-English,
ChineseEnglish, English-Hindi, and Korean-English dictionary.
This algorithm is a bottom-up approach based on the
calculation of the k nearest neighbors for each
connected component of the document. After the
removal of noise, the connected components are
separated into two groups according to a factor selected
from the proportion of character sizes. One group
consists of the characters most dominant and the other
consists of the characters of titles and headers (or head)
of sections. Then for each connected component, they
seek the k nearest neighbors, each pair of these
neighbors has an angle and an associated distance. By
grouping the components through the features
mentioned above, the geometric areas of physical
structures of the document can be determined. The
proposed method is independent of the change in
orientation of the document and of the inter-words
spacing. However, the value of k is dependent on the
structure of the document.
[Dhandra and al., 07] have segmented bilingual
documents containing one of India's regional languages
(Hindi, Kannada and Tamil) and English numbers, their
method was based on the segmentation of text in
different lines, then each line will be projected vertically
and segmented into words based on the analysis of the
•
•
•
•
valleys present in the vertical projection delimiting the
different words.</p>
      <p>The work of [Chanda and al, 07] presents a
segmentation method of bilingual documents containing
Thai and English words. Their method is to encode the
different lines of text depending on the position of black
pixels in each line. After segmentation of the document
in line, their method goes through the histogram of each
vertical line and produces a 0 if it encounters two black
pixels or less, if not the scan is valued at 1. The chain
produced is then analyzed, if there is a set of 0 with
minimum length equal to 2 * k1, mid-term is considered
as the borderline for word segmentation. The value of
k1 is an estimate of the white gap between two
consecutive characters of a document’s line.
[Rezaee and al., 09] proposed a word segmentation
method in bilingual documents containing English and
Farsi scripts. Their method is based on image’s
directional projections and the analysis of some
attributes like the gaps between words and thresholding
from the peak distribution to then segment the text lines
into words.
[Da Silva and al., 11] proposed a method of word
segmentation from Latin documents containing both the
handwritten and printed form. This method is based on
the segmentation of text on connected components and
their extraction by cropping a bounding area. After
extracting of the components, this method proceeds by
fusion of near neighbors in the same line and having a
distance between their bounding boxes less than a
threshold “th” calculated by the following formula XX:
Where k is the number of frames and Li the widths of
all the frames of the image.
[Haboubi and al., 11] proposed a segmentation method
for bilingual documents containing Arabic and Latin
script, based on the use of mathematical morphology to
delimit the different words in the text. This method uses
the morphological dilation with a line structuring
element. They use sequential dilation by increasing each
time the size of the structuring element in order to
determine a threshold that separates the spaces between
words and intra-word spaces. This threshold
corresponds to the dilation order where the number of
connected components has a zero standard deviation.</p>
      <p>IV.</p>
      <sec id="sec-2-1">
        <title>PROPOSED APPROACH</title>
        <p>The approach developed for the word segmentation of
printed bilingual documents, includes several steps (Figure 4).
From a document’s image, we begin with a preprocessing to
prepare the scanned document to the segmentation, and then
we move to the detection and extraction of text lines. After that
we analyze each line separately and we extract the different
words presents in the document.</p>
        <p>The document image is the result of an acquisition step
using a scanner. Our approach doesn’t give much interest in the
pre-processing because the proposed system should work with
images preprocessed in advance by a dedicated pre-processing
documents system such as the elimination of noise introduced
sometime during the scan documents, the skew correction,
deleting diacritics, etc.</p>
        <p>However, our approach preserves the amount of
information present in the text because we work with document
images containing diacritical signs, given their important role
in the understanding of the text, although their presence may
increase the complexity of the segmentation task because of
problems encountered in the detection and extraction lines.
Indeed, in some writing styles, diacritics may exceed the upper
or the lower limit of the line, which can error the step of
detecting lines.</p>
        <p>In our case, the pre-processing step is limited to the
binarization of the document image and the values’ inversion
of black and white pixels in order to prepare the document to
the step of detecting lines.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>B. Lines detection</title>
      <p>This phase is rather difficult in the case of bilingual
documents because of the large variability between Arabic and
Latin printed scripts, and it becomes more complex with the
presence of diacritical marks. The morphological study of
Latin and Arabic scripts shows the presence of a significant
number of diacritical signs.</p>
      <p>We have chosen to use the projection method to delimit the
horizontal lines. This method corresponds to the needs of
document’s segmentation because we handle text documents
with a simple structure. Our proposed approach goes through
the image horizontally and calculates the value of black pixels
in each row of the matrix representing the image. Next, we
have analyzed the histogram of projections, if the number of
black pixels has changed its value from 0 to a positive one then
this position is the lower limit of a line. We kept, each time, the
positions of the white areas that will be used for cropping the
image into different lines.</p>
    </sec>
    <sec id="sec-4">
      <title>C. Words detection and extraction</title>
      <p>The document segmentation allows to segment documents
at different levels, either characters, or pseudo-words, or
words. This level of segmentation is the most difficult among
the others, given that the segmentation has to differentiate
between different types of spaces between characters, between
pseudo-word and between words, which is not always obvious
to a word extraction system.</p>
      <p>The objective of the proposed approach is to segment the
document image in order to separate and extract the words of a
bilingual printed document. Segmentation methods segment
documents into connected components; either character in the
case of a Latin printed document, or pseudo-words in the case
of an Arabic printed document, because the non cursive and
semi cursive Latin and Arabic printed writing.</p>
      <p>Our approach uses mathematical morphology for the
elimination of intra-word spacing and the building of
connected components formed by different words in the
bilingual printed document. We used the morphological
dilation to enlarge the image by filling the holes corresponding
in our case to the intra-word spaces.</p>
      <p>To achieve this goal, we must determine the best structural
element able to stick the different characters of a word, without
sticking words together. At this level, two major problems
appear. The first on is the size of the structural element and the
second on is its shape. The determination of these two
characteristic features of the structuring element is the
foundation of our work.</p>
      <p>Choosing only one size of the structuring element for each
document cannot give a performing segmentation because the
intra-word spaces differ from one font to another, and depend
on the size of the font. Similarly, the spaces between words
depend on the text alignment, especially in the case of justified
text. Moreover, a document can contain different fonts and
sizes.</p>
      <p>The shape of the structuring element solves the problem of
extracting diacritics as separated words because our method
doesn’t eliminate these signs during pre-processing. A solution
of this problem is to stick diacritics to their words. So, we
opted to shapes that have a height, and we have searched the
corresponding height that solves the problem with minimal
line’s changes.</p>
      <p>The approach developed proceeds line by line to find each
time, the size and shape of the structuring element of the
dilation that can separate and extract correctly and with
minimal changes the different words in a document line.
Indeed, we proposed three methods to calculate the size of the
structuring element and selected four specific shapes for the
structuring element.</p>
      <p>Methods for calculating the size of the structuring
element</p>
      <p>After the detection of spaces in each line of the document,
we determine the size of the structuring element according to
the three proposed methods.</p>
      <sec id="sec-4-1">
        <title>Method based on median calculation</title>
        <p>This method proceeds by elimination of redundant spaces
values presents in the considered line then sort the new list of
spaces in ascending order to permit the interpretation of these
values.</p>
        <p>This list reflects the nature of spaces contained in the
processed image. It begins with the relatively small areas,
which actually represent the spaces between characters in a
word, and reaches the largest gap present in the line. This
method is based on the fact that the threshold space, able to
stick the characters of a word without sticking the words
together, has an intermediate value between the lower and
upper bound of the new list of spaces. The median value of this
list is considered as the size of the structuring element of the
dilation.</p>
      </sec>
      <sec id="sec-4-2">
        <title>Method based on the average calculation</title>
        <p>Our approach consists in testing all the combinations of
methods for calculating the size of the structuring element and
its forms. In fact, we fix each time the calculation method and
we vary the shape. We begin by applying a combination of
Latin and Arabic printed documents. If we get good results, we
continue testing on mixed documents. Otherwise, we consider
it unnecessary to apply the combination to bilingual
documents.</p>
        <p>•</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Size’s calculation of the structural element</title>
      <p>To calculate the size of the structuring element, we started
by determining the list of spaces in each line, then, we
developed three different methods to solve this problem.
</p>
      <sec id="sec-5-1">
        <title>Identification of document spaces</title>
        <p>Our approach proceeds by analyzing the extracted lines.
This analysis is based on the calculation of the vertical
projection histogram to determine the values of the different
spaces in the document. We cover vertically each line and we
calculate the number of black pixels presents in each column of
the line’s image.</p>
        <p>The next step is to analyze the vertical histogram obtained
and to determine the positions of the spaces inter and intra
words and their values, if the number of black pixels becomes
zero after a sequence of non-zero black pixels then this change
corresponds to the presence of a space in the line. We store its
position and calculate the length of this area. In fact the value
of the space or its length corresponds to the distance between
the position of appearance of this space and the position of the
first non-zero value of the number of black pixels encountered
in running through the vertical projection histogram. We obtain
at the end a list composed of space values present in the
considered line and a list with their positions.
o
o
o</p>
        <p>This second method is similar to the previous one in the
determination of distinct values of spaces. It is based on the
fact that the threshold value of the structural element of the
dilation is proportional to the number of spaces present in the
image given and their lengths. Indeed, this method sets the size
of the structuring element to the average lengths of different
spaces in the line introduced.</p>
      </sec>
      <sec id="sec-5-2">
        <title>Method based on the calculation of the</title>
        <p>difference between the values of spaces</p>
        <p>This method is based on the detection of larger jump
lengths between spaces. It works on the entire list of spaces in
line to be processed. From this list, it calculates the different
lengths of jumps in spaces. Then, it covers the new list by
determining the greatest difference between spaces. The
threshold size of the structuring element is necessarily located
between the areas that have generated the biggest jump.
Difference method associates the size of the structuring
element the average between the two spaces relative to the
largest jump determined.</p>
        <p>This calculation method of the structuring element is to
generate a list of jumps between different lengths spaces,
looking for the biggest jump, find two spaces related to this
jump and calculates their average. The size of the structuring
element corresponds to this average.</p>
        <p>•</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Determination of the structural element’s shape</title>
      <p>After calculating the size of the structuring element, comes
the phase of the choice of suitable form which allows
segmenting the bilingual printed document correctly.</p>
      <p>The structuring element can have several forms such as
square, diamond, polygon, Euclidean disc, line, point pairs,
rectangle, etc.
</p>
      <sec id="sec-6-1">
        <title>The diamond shape</title>
        <p>We were interested in this approach to four specific forms
of the structuring element, the first is the diamond shape, and
the second is the square, the third and fourth are variants of the
rectangle shape with some differences in the input parameters.</p>
        <p>We used these shapes because of the presence of diacritical
signs in the documents to be segmented. These forms have in
common a height proportional to the size of the structuring
element, an important feature in our approach in order to stick
the different diacritical marks to their words.</p>
        <p>For each line, the diameter of the diamond is equal to the
size of the structuring element determined by one of the three
calculation methods proposed later.</p>
        <p>The following table shows the results found by combining
the diamond shape with the three methods for calculating the
structural element.</p>
        <p>For each line, the length of the square is equal to the size of
structuring element determined by one of the three calculation
methods proposed later.</p>
        <p>The following table shows the results found by combining
the square shape with the three methods of calculating the
structural element.</p>
        <p>For each line, the width of the rectangle is equal to the size
of the structuring element determined by one of the three
calculation methods proposed later and height is equal to two
times this value.</p>
        <p>The following table shows the results found by combining
the shape rectangle of height 2 times the size calculated with
the three methods of calculating the structural element.
Script
Arabic
Latin
Arabic
Latin
Arabic
Latin
Script
Arabic
Latin
Arabic
Latin
Arabic
Latin</p>
        <p>Good extraction
rates</p>
        <p>We note that the best good extraction rates are obtained for
94.86% and 97.05% Arabic to Latin. These rates are achieved
by the combination of the method of calculating the structuring
element’s size based on the difference between the spaces
values and the rectangle shape with a height equal to 3 times
the size of the structuring element.</p>
        <p>For each line, the width of the rectangle is equal to the size
of the structuring element determined by one of the three
calculation methods proposed later and height equal to 3 times
this value.</p>
        <p>The following table shows the results found by combining
the shape rectangle of height three times the size calculated
with the three methods of calculating the structural element.
The application of this combination on the sample printed
bilingual documents gave a good extraction rate equal to
94.85%. This result is explained by the adequacy of method of
calculating the size of the structuring element to changes in the
lengths of spaces between the lines and document and height,
proportional to the size, the different distances of diacritics
their words.</p>
        <p>The figure shows a sample run of a line from a printed
bilingual document with diacritics.</p>
        <p>The word segmentation of the printed bilingual document
gave 14 words, which correctly corresponds to the words found
in the line of the document introduced.</p>
        <p>VI.</p>
      </sec>
      <sec id="sec-6-2">
        <title>CONCLUSION AND PERSPECTIVES</title>
        <p>The separation and extraction of words in a printed
bilingual document constituted the main contribution of our
recognition’s area, its different stages, and the various available
methods of documents segmentation into words.</p>
        <p>After studying the Arabic and Latin scripts we have
proceeded to the implementation of our approach. We have
developed different methods for calculating the size of the
structuring element of morphological dilation, combined with
different forms and tested on samples of printed Arabic and
Latin documents. After that, we have compared the results, and
the best performing combination was chosen to testing printed
bilingual documents, subject of our study.</p>
        <p>Although the result obtained by this method used for the
separation of words is compelling, it has some limitations:
• The sample size is 945 words for the printed Latin
documents, 545 words for Arabic printed documents and 564
words for printed bilingual documents;
• This approach only deals with the printed documents;
• This approach is limited to bilingual Arabic and Latin
documents.</p>
        <p>In perspectives, we expect to enlarge the sample size to
better test the performance of the proposed method. We also
plan to extend our method to the processing of textual
handwritten bilingual documents, to mixed bilingual
documents (both handwritten and printed forms in the same
document) as well as treatment of bilingual documents of any
kind and even that of multilingual documents.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Touj and al.,
          <volume>04</volume>
          ]
          <string-name>
            <given-names>Sofiene</given-names>
            <surname>Touj</surname>
          </string-name>
          , Najoua Essoukri Ben Amara, Hamid Amiri, «
          <string-name>
            <surname>Reconnaissance de l'Ecriture Arabe Imprimée par Transformée de Hough</surname>
          </string-name>
          <article-title>Généralisée », dans Conférence Internationale Francophone sur l'Ecrit et le Document (CIFED</article-title>
          <year>04</year>
          )
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [Ma and Doermann,
          <volume>03</volume>
          ] : Huanfeng Ma, David Doermann, «
          <article-title>Gabor Filter Based Multi-class Classifier for Scanned Document Images »</article-title>
          ,
          <source>Proceedings of the Seventh International Conference on Document Analysis and Recognition (ICDAR'03</source>
          )
          <fpage>0</fpage>
          -
          <lpage>7695</lpage>
          -1960-1/03 $
          <fpage>17</fpage>
          .00 © 2003 IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [Dhandra and al.,
          <volume>07</volume>
          ] :
          <string-name>
            <given-names>B.V.</given-names>
            <surname>Dhandral</surname>
          </string-name>
          , Mallikarjun Hangarge,
          <string-name>
            <given-names>Ravindra</given-names>
            <surname>Hegadil</surname>
          </string-name>
          and
          <string-name>
            <given-names>V.S.</given-names>
            <surname>Malemathl</surname>
          </string-name>
          , «
          <article-title>Word Level Script Identification in Bilingual Documents through Discriminating Features »</article-title>
          ,
          <source>International Conference on Signal Processing, Communications and Networking</source>
          ,
          <year>2007</year>
          . ICSCN '
          <volume>07</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [Chanda and al,
          <volume>07</volume>
          ] :
          <string-name>
            <given-names>S.</given-names>
            <surname>Chanda</surname>
          </string-name>
          , Oriol Ramos Terrades and U. Pal, «
          <article-title>SVM Based Scheme for Thai and English Script Identification »</article-title>
          ,
          <source>Ninth International Conference on Document Analysis and Recognition (ICDAR</source>
          <year>2007</year>
          )
          <volume>0</volume>
          -
          <fpage>7695</fpage>
          -2822-8/07 $
          <fpage>25</fpage>
          .00 © 2007 IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [Rezaee and al.,
          <volume>09</volume>
          ] : Hamideh Rezaee, Masoud Geravanchizadeh, Farbod Razzazi, «
          <article-title>Automatic Language Identification of Bilingual English and Farsi Scripts »</article-title>
          ,
          <source>IEEE International Conference on Application of Information and Communication Technologies (AICT)</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>[Da</given-names>
            <surname>Silva</surname>
          </string-name>
          and al.,
          <volume>11</volume>
          ] : Lincoln Faria da Silva, Aura Conci, Angel Sanchez, «
          <article-title>Word-Level Segmentation in Printed and Handwritten Documents »</article-title>
          ,
          <source>publié dans IEEE 18th International conference on Systems, Signals and Image Processing (IWSSIP)</source>
          ,
          <fpage>2011</fpage>
          - Sarajevo.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [Haboubi and al.,
          <volume>11</volume>
          ] : Sofiene Haboubi, Samia Snoussi Maddouri, Hamid Amiri, «
          <article-title>Discrimination between Arabic and Latin from bilingual documents »</article-title>
          ,
          <source>publié dans IEEE International Conference on Communications, Computing and Control Applications (CCCA)</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>