<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>of Characters from Ancient Manuscripts-A Review</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Miss. Ketki. R. Ingole</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dr. Pritish A. Tijare</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Recognition</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Beside this</institution>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Sipna College of Engineering &amp; Technology</institution>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2004</year>
      </pub-date>
      <fpage>49</fpage>
      <lpage>54</lpage>
      <abstract>
        <p>Now a days, Intelligent Character Recognition used for various application such as bank, schoolcolleges,online business. Intelligent Character recognition, which written in different styles and format. Ancient manuscripts one of the area where automatic character recognition is required. India has prosperous collection of manuscripts, available at museums, National library in degraded form. A major steps taken by Government, Institutes and some organization for preserving these manuscripts. The research work has been done mainly in image processing area for preserving manuscripts, and it get succeed it. These manuscripts were written in various scripting languages and with different writing styles. Mainly manuscripts were written in continuous manner without picking the pen, which gives the character a cursive appearance. Character Recognition systems worked on the normal handwritten documents, but faces difficulties to read a cursive characters. With the help of Artificial Intelligence system are able to recognize characters by feature extraction and classification technique. character recognition from ancient manuscripts are degraded manuscripts,different writing styles, similar character form. Details regarding recognizing characters from ancient documents are reviewed in this paper. Intelligent Character Recognition, Ancient Manuscripts, Artificial Intelligence, Feature Extraction and Classification,Degraded Manuscripts has richest collection Libraries,Universities and temples, written in various languages such as Sanskrit, Tamil, Marathi and Telugu written in various scripting languages. But most of the manuscripts are available in degraded form and others ACI'22: Workshop on Advances in Computation Intelligence, its Concepts &amp; Applications at ISIC 2022, May 17-19, Savannah, United States</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Ancient manuscripts is an repository of cultural heritage. It cultivates the knowledge in a form of scripts and
in form of stories of different civilization from different era.India is identified as land of versatility in the
form of culture heritage, and this heritage cultivated through this these ancient manuscripts. Indian Ancient
Manuscripts stores knowledge of astronomy,cosmology, arts, medicine, mathematics and science and it
cultivates from century to century and it passes from generation to generation.</p>
      <p>In India temples, museums are source of such a manuscripts. In old age Indian Emperors, authorities of
temples, were take charge of such manuscripts and degraded manuscripts always been get destroyed after
they had been copied. As time passes, these cycle of restoration was broken, resulting into degradation of
manuscripts and ignorance of knowledge it contains.</p>
      <p>2020 Copyright for this paper by its authors.
at the edge of degradation.Now a days, government, many foundations and many institutes come forward,
and taking efforts for preserving these treasure of knowledge.Such organization are working for preserving
these cultural heritage in digital form either by scanning or through digital photograph. With the help of
digital image processing, manuscripts are not only get preserved but it can restore in their original form.
Most of the manuscripts were written on leaf, paper, and metal sheet.As the preservation cycle was broken,
these manuscripts are facing problem of degradation. Mostly manuscripts get degraded due to natural end of
life of paper and leaf, fungus attack, cockroaches, and these are the main hurdles while converting it into
digital format. Now a days, advanced photography and scanning equipment are available, still the digital
form of manuscripts are unreadable form. As the text in manuscripts are not clear due to ink seepage, dirt,
wholes and cracks, while scanning manuscripts demanded to be flat as they are deteriorated , most of the
time during image acquisition the light source is uneven, which leads to difficulties while extracting
characters from digitized manuscripts.</p>
      <p>With the help of image enhancement technique helps to enhance the digital form of manuscripts which leads
to retrieval of text from such degraded manuscripts.It reduces the hurdles of text extraction and increases the
readability.But still adequate methods are unrevealed for producing quality results.</p>
      <p>Manuscripts were written in various scripting language, such scripts were written with ink, and in cursive
and continuously. While recognizing text from such manuscripts cursive writing, writing style, uneven
alignment, text without punctuation marks, creates problems. Most of the manuscripts were written double
sided, which are facing a problem of bleed through effects.</p>
      <p>Character recognition is a vast area of research and mostly work done on it, still ancient scripting language
character recognition requires more focus. As there are few people are available for recognizing such ancient
scripts and retrieving knowledge from it. As many organization and government working on it, but due to
presence of overlapping lines, different writing styles,and similar shapes of text increases the complexity
of recognizing characters.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Need To Explore The New Approach</title>
      <p>Most of the manuscripts are unreadable due to ink seepage and background impression called as
bleedthrough effect. Due to bleed-through effect difficulties arises while separating the text from background
manuscript image.The discoloration of manuscripts leads to produce the same color and hence texts are not
clear to read.</p>
      <p>To improve the readability of manuscripts various Image processing techniques are available, but still such
methods are not adequate to produce a quality result for some documents. Most of the work is done on
removing noise from the images and character recognition. Hence there is a scope to improve the manuscript
image quality and make it available to researchers.</p>
      <p>At the same time different handwriting is one of the hurdles to co-relate the characters.Pattern recognition is
used to reduce the ambiguity for identifying characters/words from ancient documents.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Literature Review</title>
      <p>To improve the quality of manuscript images and character recognition researchers have discovered many
methods, a short report is presented in this section.</p>
      <p>
        Previously designed manuscript image enhancement algorithms aims are to retrieve the text from
manuscripts with rough background. At first,
        <xref ref-type="bibr" rid="ref1">(N.Ostu,1979)</xref>
        proposed the thresholding method on gray scale
images by increasing the discriminant measures between the pixel. This method is time consuming due to
inefficient formulation and with fixed threshold it becomes difficult to achieve consistent quality.
The research work dedicated for colored document images. DjVu,
        <xref ref-type="bibr" rid="ref2">(Bottou, 1998)</xref>
        , in context of compression
implements an algorithm efficiently worked for separating foreground-background.
      </p>
      <p>
        Most of the manuscripts written on both sides, while restoring such a document back impression is an major
hurdle. The proposed direct image matching and directional wavelets methods reduces background noise
and bleed through effect
        <xref ref-type="bibr" rid="ref3">(Q.Wang, 2003)</xref>
        .
The proposed adaptive method helps to separate the text by using local information. The proposed for low
quality color document images
        <xref ref-type="bibr" rid="ref4">(Leydier, Emptoz,2004)</xref>
        .
      </p>
      <p>
        The proposed
        <xref ref-type="bibr" rid="ref5 ref6">(Zhixin Shii and Venu Govindraju, 2005)</xref>
        an algorithm uses a local adaptive binarization
algorithm for background light intensity normalization and enhancing images.
      </p>
      <p>
        In another research paper,
        <xref ref-type="bibr" rid="ref5 ref6">(Shi,Setlur,Govindraju, 2005)</xref>
        , proposed image enhancement algorithm for color
image of palm leaf manuscripts.Palm leaf manuscripts are available in various libraries. The degraded
version of Palm leaf manuscripts are preserved by applying various chemicals, which reduces the readability
of that document.The images of an ancient, degraded palm leaf enhanced by using enhancement algorithm.
Furthermore, Fuzzy logic method is used for ancient documents (J.M.Gil,2006). They investigated method
for identifying distance between letters and character styles. They used Gabor filter for feature extraction
and for feature classification techniques fuzzy logic is used. With the help of Gabor filter local information
is extracted from different environment in an image and aspect ration calculated for each character.
The hybrid method, (Wafa Bousellaa, 2008), for image segmentation of historical Arabic manuscripts. This
algorithm combines normalization and clustering algorithm for light intensity normalization and foreground
and background separation.
      </p>
      <p>Furthermore, Dynamic Bayesian Networks(DBNs) method used for recognising the characters(Sulem,
2008). DBNs method efficient for recognizing broken characters increases readability of degraded and
unclear character improved.</p>
      <p>Most of the historical documents suffering the problem of back impression if they written on both sides. To
restore such document, (Jie Wang, 2011) proposed a theory of restoration with the help of non-rigid
registration method. A method based on directional wavelet helps to reduce the back impression. The
evaluation shows 85.2% precision and improve the appearances of document.</p>
      <p>A Chain Code based method (Chandure and Inamdar,2017), worked on both Devnagari and Modi vowels
characters, A Chain Code based method in combination with BPNN, KNN, and SVM.It shows good results
for Devnagari vowels than Modi vowels.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Issue Identified in Literature Review</title>
      <p>The below table1 shows the various issues which will be observe within the previously developed system
for the enhancement and character recognition of ancient manuscripts.</p>
      <sec id="sec-4-1">
        <title>It is difficult to achieve consistent quality with fixed threshold.</title>
      </sec>
      <sec id="sec-4-2">
        <title>Separates the foreground and background, but results not available for an ancient documents.</title>
      </sec>
      <sec id="sec-4-3">
        <title>Reduces background noise and bleed through effect, still strong background impression reduces the accuracy. (Leydier, Emptoz,2004)</title>
      </sec>
      <sec id="sec-4-4">
        <title>K- Means Clustering</title>
        <p>(Zhixin Shii and Venu
Govindraju, 2005)</p>
      </sec>
      <sec id="sec-4-5">
        <title>Normalisation technique</title>
        <p>(Shi,Setlur,Govindraju, 2005)</p>
      </sec>
      <sec id="sec-4-6">
        <title>Normalisation technique (J.M.Gil,2006)</title>
      </sec>
      <sec id="sec-4-7">
        <title>Fuzzy methods</title>
        <p>(Wafa Bousellaa, 2008)</p>
      </sec>
      <sec id="sec-4-8">
        <title>Normalisation Technique with K-means Clustering (Sulem, 2008)</title>
      </sec>
      <sec id="sec-4-9">
        <title>Bayesian Networks</title>
        <p>(Jie Wang, 2011)
(Chandure and Inamdar,2017)</p>
      </sec>
      <sec id="sec-4-10">
        <title>Feature Extraction Select control points with matching features</title>
      </sec>
      <sec id="sec-4-11">
        <title>Chain code histogram</title>
      </sec>
      <sec id="sec-4-12">
        <title>The results on ancient documents was good but computationally expensive.</title>
      </sec>
      <sec id="sec-4-13">
        <title>If intensity of foreground text and</title>
        <p>background are same then difficulties
arrived for extracting text.</p>
      </sec>
      <sec id="sec-4-14">
        <title>Generates binarised image with text degradation</title>
      </sec>
      <sec id="sec-4-15">
        <title>Automatic parameter detection system with good heuristic function is required</title>
      </sec>
      <sec id="sec-4-16">
        <title>Improvement is required in</title>
        <p>segmentation process.</p>
      </sec>
      <sec id="sec-4-17">
        <title>Efficient to recognize broken characters but accurate parameter initialization is required.</title>
      </sec>
      <sec id="sec-4-18">
        <title>Post processing method is required to recover the broken foreground text</title>
      </sec>
      <sec id="sec-4-19">
        <title>Large data set for accurate character recognition is required.</title>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Proposed Methodology 5.1.</title>
    </sec>
    <sec id="sec-6">
      <title>Image(Color/Gray) Enhancement</title>
      <p>Generally, historical documents facing two types of deficiencies. First, the original document is in
deteriorated condition and second problem is uneven background while converting the document into digital
form.</p>
      <p>The enhancement technique helps to improve the image quality from the low contrast image. Most image
enhancement technique is available to reduce the uneven background which perform well for some historical
document and removes the hurdles for extracting the text.
5.2.</p>
    </sec>
    <sec id="sec-7">
      <title>Character / Pattern Recognition</title>
      <p>After enhancement the image gets enhanced but still some texts are unrecognizable. Feature extraction for
character recognition becomes difficult. That part can be made readable by using pattern recognition.
Feature and property extraction from input data is accomplished through the use of a neural network. Salient
features that are regular to a given degree of shift and shape variations or distortions are automatically
extracted by neural network.</p>
    </sec>
    <sec id="sec-8">
      <title>6. Conclusion</title>
      <p>In recent times,character recognition from ancient manuscripts has major concern, as manuscripts are
repository of knowledge.Researchers in recognizing characters from ancient documents domain confronted
challenge of analyzing a accurate characters from it. Enhancement techniques helps to resolve the primarily
problem of uneven background intensity. Foreground and background normalization techniques separates
the background from foreground text, which increases the readability. With the help of neural network text
features extracted, for recognizing characters. As the low contrast document, skewed images, cursive
writing, different writing styles leads for a further improvement of accuracy.
[7] J. M. C. Sousa, J. M. Gil, C.S. Ribeiro and J.R.C. Pintom, “Old Document Recognition Using Fuzzy</p>
      <p>Methods”, Intelligent Systems Technologies and Applications, Vol.1, 2006
[8] Wafa Boussellaa, “A Methodology for the Separation of Foreground/Background in Arabic Historical</p>
      <p>Manuscripts using Hybrid Methods”, Journal of Universal Computer Science, vol. 14,2008.
[9] M. Sigelle and L. L. Sulem, “Recognition of degraded characters using dynamic Bayesian networks”,</p>
      <p>Pattern Recognition, Vol. 41 Issue 10, 2008.
[10] Jie Wang; Chew Lim Tan; , "Non-rigid Registration and Restoration of Double-Sided Historical
Manuscripts," Document Analysis and Recognition (ICDAR), 2011 International Conference,
September 2011.
[11] V. Inamdar and S.L. Chandure, “Performance analysis of handwritten Devnagari and MODI Character
Recognition system”, in Conference Computer Analysis of Secure Trends, 2017.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>N.</given-names>
            <surname>Otsu</surname>
          </string-name>
          , “
          <article-title>A threshold selection method from gray level histogram"</article-title>
          ,
          <source>IEEE Transactions in Systems, Man, and Cybernetics</source>
          , vol.
          <volume>9</volume>
          , pp.
          <fpage>62</fpage>
          ,
          <year>1979</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Bottou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Haffner</surname>
          </string-name>
          , and P.G Howard,
          <article-title>“High Quality Document Image Compression with DjVu”</article-title>
          ,
          <source>Journal of Electronic Imaging</source>
          ,
          <year>July 1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Tan</surname>
          </string-name>
          , “
          <article-title>Document image enhancement using directional wavelet,"</article-title>
          IEEE Conference,
          <source>Computer Vision</source>
          and Pattern Recognition, Madison, Wisconsin, USA,
          <year>June 2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Leydier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Emptoz</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.L.</given-names>
            <surname>Bourgeois</surname>
          </string-name>
          ,“
          <article-title>Serialized K-Means for Adaptive Color Image Segmentation-Application to Document Images</article-title>
          and Others”,
          <source>Workshop on Document Analysis Systems(6th International)</source>
          , Italy,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Zhixin</given-names>
            <surname>Shi</surname>
          </string-name>
          and
          <article-title>Venu Govindaraju “Historical Handwritten Document Image Segmentation Using Background Light Intensity Normalization”, SPIE Document Recognition</article-title>
          and
          <string-name>
            <surname>Retrieval</surname>
            <given-names>XII</given-names>
          </string-name>
          , San Jose, California, USA,
          <year>January 2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Venu</given-names>
            <surname>Govindaraju</surname>
          </string-name>
          ,
          <article-title>Zhixin Shi Srirangaraj and Setlur ,“Digital Image Enhancement using Normalization Techniques and their application to Palm Leaf Manuscripts”</article-title>
          , CEDAR, Buralo,
          <string-name>
            <given-names>U.S.A.</given-names>
            <surname>February</surname>
          </string-name>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>