<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>mOCRa: Mobile OCR Application∗</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>mOCRa: Aplicacio´n OCR Mo´vil</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xose R. De La Puente</string-name>
          <email>xose.puente.romay@udc.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IRLab</institution>
          ,
          <addr-line>A Corun ̃a Univ. Campus Elvin ̃a s/n A Corun ̃a</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Ismael Hasan IRLab, ICT Centre Campus Elvin ̃a s/n A Corun ̃a</institution>
        </aff>
      </contrib-group>
      <fpage>04</fpage>
      <lpage>04</lpage>
      <abstract>
        <p>In the last years, mobile phones have evolved to devices with highresolution cameras and Internet connection. In this context, the idea of applying OCR techniques to the pictures taken with these devices rises. As result of this idea, we have built mOCRa, the application presented in this work.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>The evolution of mobile devices has been
vertiginous: two decades ago they were big
devices, with a low autonomy of battery, and
they only could be used to make phone calls;
nowadays, mobile phones are devices with
multimedia capabilities, Internet access, etc.
Their features include the integration of the
mobile phones with digital cameras. With
this in mind, it rises the idea of using a
mobile device to extract the text from the
pictures taken with the camera; this idea is
materialised in mOCRa, and its first version for
Android devices.
mOCRa client application offers to the
users an accessible and easy-to-use interface,
optimised capturing of images with text and
tools to manage and edit the texts recovered
from the pictures.</p>
      <p>The application interface includes an
adaptable grid so the user can use it to align
it with the lines of text (see Figure 1); it
also allows to set the quality of the picture
to be taken according to the amount of grid
lines. In order to ease the use of the
application, four quality levels have been defined in
mOCRa: low, medium, high and very high.
The “very high” level is suitable to analyse
full page texts.</p>
      <p>The process of obtaining the text from a
picture goes as follows: once an image is
captured with the phone it is sent to the server.
The server processes it and returns the text to
the phone; the user of the device can modify
the text using a text editor, can store it, can
send it via e-mail or can select a portion of
text to be used as a query against Google. A
video demonstration of mOCRa can be found
in the IRLab web1.
3.</p>
    </sec>
    <sec id="sec-2">
      <title>System Architecture</title>
      <p>The application mOCRa was developed
for the Android platform. Also, it was
designed to be easily migrated to other
mobile operating systems. To accomplish this,
the application follows a client-server
architecture; moreover, to guarantee
compatibility, communication issues are managed
using Web Services. Both client and server
systems follow a component-based architecture,
to build a functionally scalable application.</p>
      <p>The mobile device system comprises the
following modules: user interface module,
Web Services based communication
management module, stored texts management
module and public key management module, to
communicate with the server using secure
connections.</p>
      <p>The server system comprises the following
modules: image pre-processing module, OCR
analysis module, a parsing module for each
Web Service and a business logic module for
each Web Service. The aforementioned Web
Services are used to configure the
application, to send the images and responses and
to avoid sending data in the case the server
is overloaded.
4.</p>
    </sec>
    <sec id="sec-3">
      <title>Evaluation</title>
      <p>The application was evaluated in terms of
effectiveness, to check that the results are
correct, and efficiency, to check that the results
are obtained in an acceptable time. The
reference to be compared with was another
mobile OCR application, SnapIt, available for
Android systems and developed by mocrsoft.
This system has been commercialised for a
while, and it is one of the main competitors
of mOCRa.
4.1.</p>
      <sec id="sec-3-1">
        <title>Effectiveness</title>
        <p>
          In this analysis the similarity between the
text obtained using the applications and the
original text was measured using the
Levenshtein distance (LD)
          <xref ref-type="bibr" rid="ref3">(Levenshtein, 1966)</xref>
          , in
a similar way it was used in other evaluations
1http://www.irlab.org/?q=publications/multimedia
of OCR systems
          <xref ref-type="bibr" rid="ref1">(E. Borovikov and Turner,
2004)</xref>
          , and the normalised Levenshtein
distance (NLD).
        </p>
        <p>Levenshtein distance is used to give an
insight of the amount of differences between
two sequences of characters. It measures the
minimum number of operations needed to
transform one of the sequences to the other.
These operations include replacement,
addition and deletion of single characters.</p>
        <p>Normalised Levenshtein distance is used
to give an insight about the similarity: a
value of zero means that the sequences are
completely different, and a value of one means
that they are equal. Its formula is</p>
        <p>LD
N LD = 1 − LDM ax
(1)
where LDM ax is the length of the longest
text.</p>
        <p>The testbed includes an heterogeneous set
of texts to cover several significant
characteristics to compare the applications. Snapit
does not allow to load images from the
memory of the phone, it only retrieves the text
from pictures taken with the camera, and it
does not store them. For this reason, the
images used to compare the results are not
exactly the same for both systems. To minimise
the impact of using different pictures, several
images were taken for each text and
application, and in the evaluation it was used the
one offering the best results for each text.</p>
        <p>The applications were tested using two
different Android devices, with different
specifications: HTC Magic and Nexus One. To do
the tests, three texts were used; a brief
explanation of each text follows, accompanied
with the comparison of the results for each
application and mobile phone.</p>
        <p>The first text used in the evaluation is an
economics text written in English. It is a full
page containing 3.729 characters, with a font
size of 12pt. Snapit only allows to take
pictures in landscape format, so it was
neccessary to take two photographs to process the
entire page. The “very high” quality option
was used in mOCRa.</p>
        <p>Table 1 shows that the results of mOCRa
clearly outperform the ones obtained with
SnapIt. To deal with full page images,
mOCRa includes a “very high” image
quality mode to optimise the results. SnapIt does
not offer a similar option; moreover, due to
the fact that the pictures can only be taken
in landscape format, it is necessary to do two
photographs.</p>
        <p>The second text contains 697 characters,
uses a font size of 12pt, and it is written in
English. This document contains the same
phrase repeated with different font types and
formats (boldface, italic and underlined). The
phrase does not make sense, but it is
intended to cover most of the usual characters:
“The (quick) brown {fox} jumps! over the
$3,456.78 &lt;lazy&gt; #90 dog &amp; duck/goose, as
12.5 % of E-mail ”.</p>
        <p>The results in table 2 show that mOCRa
again outperforms SnapIt. The configuration
used in mOCRa was the “high” quality one,
which is suitable to extract the text from
several paragraphs.
A complex table was choosen as the
document for the last comparison. Each cell
contains several lines of text; the total amount of
characters is 1387. The pictures include the
two upper rows of the table and the full width
(five columns), covering half of the page.</p>
        <p>From the results showed in table 3 it can
be inferred that nor mOCRa neither SnapIt
apply layout analysis techniques to deal with
tables or multi-column layouts. The results
are pretty bad: the distribution of text
causes problems in the OCR process and the lines
of the tables are extracted as extra
characters. In this test mOCRa results include more
errors than the SnapIt ones (mOCRa has a
higher LD); however, its NLD value is
better. The main implication of this fact is that
mOCRa extracts more text from the noise of
the image than SnapIt, but it is more
accurate obtaining the real text.</p>
        <p>The tests also show an interesting fact:
SnapIt offers better results with the HTC
Magic device, despite the fact that its camera
quality is lower that the Nexus One camera.
Because of this, it is our belief that this
application uses some image processing techniques
which are dependant on the resolution of the
pictures. This does not happen with mOCRa,
offering better results as the quality of the
images improves.
4.2.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Efficiency</title>
        <p>These tests were run in a Nexus One
device. It accessed the web through a wireless
802.11g connection (54 Mb/s) to
communicate with the server2. The execution times
showed in tables 4 and 5 comprise the process
since a picture is sent for processing until the
text is shown in the mobile device.
4.2.1. mOCRa</p>
        <p>The application allows the users to choose
the quality (size) of the picture to send. This
choice can be done by adapting the size of
the display grid, or by selecting one of the
four available predefined quality levels. In the
tests the data was collected for these
predefined levels. The computing time in mOCRa
includes:</p>
        <sec id="sec-3-2-1">
          <title>1. mobile - configuration sending,</title>
        </sec>
        <sec id="sec-3-2-2">
          <title>2. mobile - image splitting,</title>
          <p>3. mobile - creation and sending of Web</p>
          <p>Service packages containing the image,</p>
        </sec>
        <sec id="sec-3-2-3">
          <title>4. server - image reconstruction,</title>
        </sec>
        <sec id="sec-3-2-4">
          <title>5. server - image pre-processing,</title>
        </sec>
        <sec id="sec-3-2-5">
          <title>6. server - OCR processing,</title>
        </sec>
        <sec id="sec-3-2-6">
          <title>7. server - text sending,</title>
        </sec>
        <sec id="sec-3-2-7">
          <title>8. mobile - text display.</title>
          <p>Also, it is worthy to mention that the
communications involving images and text are
encrypted using SSL.</p>
          <p>To obtain the results 10 pictures were
processed. Table 4 shows the time results: it can
be observed that the use of the “very high”
option is very time-consuming, but it takes
advantage of the maximum quality the
camera can offer. As previously stated, mOCRa
results improve with quality images.</p>
          <p>2Intel(R) Core(TM)2 Quad CPU Q6600 @
2.40GHz, 4 GB of ram</p>
          <p>About the time results, we must remark
that the process which most penalises the
excution time of mOCRa is to send the image.
This operation takes longer than the image
pre-processing and retrieval of text.</p>
          <p>mOCRa processing time (s)
4.2.2. SnapIt</p>
          <p>The tests run to check the efficiency of this
application were the same used to check the
efficiency of mOCRa. We cannot provide any
information about the way in which SnapIt
retrieves the text (we have no access to its
source code), but we can assume that it also
uses a server (since it cannot work without
a connection to the Internet), so the process
should share some similarities.</p>
          <p>Table 5 shows the results. SnapIt times
improve the times of mOCRa; therefore, it
is our belief that this application does not
take advantage of the quality of the cameras
integrated in the mobile phones.</p>
          <p>SnapIt processing time (s)</p>
        </sec>
        <sec id="sec-3-2-8">
          <title>Mean Time</title>
          <p>Best Time
Worst Time
4,25
2,5
7</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclussions and Future Work</title>
      <p>The application presented in this work,
mOCRa, shows good results. The system can
provide an excellent startpoint to build
specific and complex systems, offering for
instance generation of summaries, generation of
snippets, entities detection or language
translation.</p>
      <p>
        Our next works with mOCRa will
include the improvement of the results by
the use of top-down layout detection
techniques, table boundaries detection techniques
and the use of text post-processing
techniques to detect the noise and to correct
bad-recognised words. With these
improvements, the text could be used for complex
Information Retrieval tasks: the techniques
of
        <xref ref-type="bibr" rid="ref4">Parapar, Freire, and Barreiro (2009</xref>
        ) can
be applied to use these texts in IR systems;
moreover, the wor
        <xref ref-type="bibr" rid="ref2">k of K. Taghva and Condit
(1994</xref>
        ) states that if a picture is good enough
and a post-processing of the text is applied,
the final result has the same quality as a text
manually created and corrected.
      </p>
      <p>Finally, it is worthy to mention that we
are working on image compression techniques
and in the improvement of the
communication protocols to obtain better results in
terms of efficiency.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>E.</given-names>
            <surname>Borovikov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Zavorin</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Turner</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>A filter based post-OCR accuracy boost system</article-title>
          .
          <source>In Proceedings of the 1st ACM workshop on Hardcopy document processing</source>
          , pages
          <fpage>23</fpage>
          -
          <lpage>28</lpage>
          , Washington, DC, (USA).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>K.</given-names>
            <surname>Taghva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Borsack</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Condit</surname>
          </string-name>
          .
          <year>1994</year>
          .
          <article-title>Results of applying probabilistic IR to OCR text</article-title>
          .
          <source>In Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval</source>
          , pages
          <fpage>202</fpage>
          -
          <lpage>211</lpage>
          , Dublin, Ireland.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Levenshtein</surname>
            ,
            <given-names>V.I.</given-names>
          </string-name>
          <year>1966</year>
          .
          <article-title>Binary Codes Capable of Correcting Deletions, Insertions, and</article-title>
          <string-name>
            <given-names>Reversals. Soviet</given-names>
            <surname>Phys</surname>
          </string-name>
          . Dokl.,
          <volume>10</volume>
          (
          <issue>8</issue>
          ):
          <fpage>707</fpage>
          -
          <lpage>710</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Parapar</surname>
            , Javier,
            <given-names>Ana</given-names>
          </string-name>
          <string-name>
            <surname>Freire</surname>
            , and
            <given-names>A</given-names>
          </string-name>
          ´lvaro Barreiro.
          <year>2009</year>
          .
          <article-title>Revisiting n-gram based models for retrieval in degraded large collections</article-title>
          .
          <source>In ECIR '09: Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval</source>
          , pages
          <fpage>680</fpage>
          -
          <lpage>684</lpage>
          , Berlin, Heidelberg. Springer-Verlag.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>