mOCRa: Mobile OCR Application∗
                            mOCRa: Aplicación OCR Móvil

             Xose R. De La Puente                            Ismael Hasan
              IRLab, A Coruña Univ.                       IRLab, ICT Centre
                Campus Elviña s/n                         Campus Elviña s/n
                     A Coruña                                  A Coruña
             xose.puente.romay@udc.es                         ihasan@udc.es

      Resumen: En los últimos años, los teléfonos móviles han evolucionado hasta con-
      vertirse en dispositivos con cámaras de gran resolución y conexión a Internet. En
      este contexto, surge la idea de aplicar tecnologı́as OCR a las fotos de los móviles.
      Esta idea origina mOCRa, la aplicación presentada en este trabajo.
      Palabras clave: Reconocimiento óptico de caracteres, aplicación móvil, Android
      Abstract: In the last years, mobile phones have evolved to devices with high-
      resolution cameras and Internet connection. In this context, the idea of applying
      OCR techniques to the pictures taken with these devices rises. As result of this idea,
      we have built mOCRa, the application presented in this work.
      Keywords: Optical character recognition, mobile application, Android

1.   Introduction
   The evolution of mobile devices has been
vertiginous: two decades ago they were big
devices, with a low autonomy of battery, and
they only could be used to make phone calls;
nowadays, mobile phones are devices with
multimedia capabilities, Internet access, etc.
Their features include the integration of the
mobile phones with digital cameras. With
this in mind, it rises the idea of using a mo-
bile device to extract the text from the pic-
tures taken with the camera; this idea is ma-
terialised in mOCRa, and its first version for
Android devices.

2.   System Overview
   mOCRa client application offers to the
users an accessible and easy-to-use interface,
optimised capturing of images with text and
tools to manage and edit the texts recovered
from the pictures.
   The application interface includes an
adaptable grid so the user can use it to align      Figure 1: mOCRa working with a Nexus One
it with the lines of text (see Figure 1); it        device
also allows to set the quality of the picture
to be taken according to the amount of grid         mOCRa: low, medium, high and very high.
lines. In order to ease the use of the applica-     The “very high” level is suitable to analyse
tion, four quality levels have been defined in      full page texts.
                                                        The process of obtaining the text from a
∗
   This work was funded by FEDER, Minis-
terio de Ciencia e Innovación and Xunta de
                                                    picture goes as follows: once an image is cap-
Galicia under projects TIN2008-06566-C04-04 and     tured with the phone it is sent to the server.
07SIN005206PR.                                      The server processes it and returns the text to
the phone; the user of the device can modify               of OCR systems (E. Borovikov and Turner,
the text using a text editor, can store it, can            2004), and the normalised Levenshtein dis-
send it via e-mail or can select a portion of              tance (NLD).
text to be used as a query against Google. A                  Levenshtein distance is used to give an in-
video demonstration of mOCRa can be found                  sight of the amount of differences between
in the IRLab web1 .                                        two sequences of characters. It measures the
                                                           minimum number of operations needed to
3.         System Architecture                             transform one of the sequences to the other.
    The application mOCRa was developed                    These operations include replacement, addi-
for the Android platform. Also, it was de-                 tion and deletion of single characters.
signed to be easily migrated to other mo-                     Normalised Levenshtein distance is used
bile operating systems. To accomplish this,                to give an insight about the similarity: a val-
the application follows a client-server archi-             ue of zero means that the sequences are com-
tecture; moreover, to guarantee compatibil-                pletely different, and a value of one means
ity, communication issues are managed us-                  that they are equal. Its formula is
ing Web Services. Both client and server sys-                                        LD
tems follow a component-based architecture,                            N LD = 1 −                     (1)
                                                                                    LDM ax
to build a functionally scalable application.
    The mobile device system comprises the                 where LDM ax is the length of the longest
following modules: user interface module,                  text.
Web Services based communication manage-                       The testbed includes an heterogeneous set
ment module, stored texts management mod-                  of texts to cover several significant charac-
ule and public key management module, to                   teristics to compare the applications. Snapit
communicate with the server using secure                   does not allow to load images from the mem-
connections.                                               ory of the phone, it only retrieves the text
    The server system comprises the following              from pictures taken with the camera, and it
modules: image pre-processing module, OCR                  does not store them. For this reason, the im-
analysis module, a parsing module for each                 ages used to compare the results are not ex-
Web Service and a business logic module for                actly the same for both systems. To minimise
each Web Service. The aforementioned Web                   the impact of using different pictures, several
Services are used to configure the applica-                images were taken for each text and applica-
tion, to send the images and responses and                 tion, and in the evaluation it was used the
to avoid sending data in the case the server               one offering the best results for each text.
is overloaded.                                                 The applications were tested using two dif-
                                                           ferent Android devices, with different speci-
4.         Evaluation                                      fications: HTC Magic and Nexus One. To do
    The application was evaluated in terms of              the tests, three texts were used; a brief ex-
effectiveness, to check that the results are cor-          planation of each text follows, accompanied
rect, and efficiency, to check that the results            with the comparison of the results for each
are obtained in an acceptable time. The ref-               application and mobile phone.
erence to be compared with was another mo-                     The first text used in the evaluation is an
bile OCR application, SnapIt, available for                economics text written in English. It is a full
Android systems and developed by mocrsoft.                 page containing 3.729 characters, with a font
This system has been commercialised for a                  size of 12pt. Snapit only allows to take pic-
while, and it is one of the main competitors               tures in landscape format, so it was necces-
of mOCRa.                                                  sary to take two photographs to process the
                                                           entire page. The “very high” quality option
4.1.         Effectiveness                                 was used in mOCRa.
   In this analysis the similarity between the                 Table 1 shows that the results of mOCRa
text obtained using the applications and the               clearly outperform the ones obtained with
original text was measured using the Leven-                SnapIt. To deal with full page images,
shtein distance (LD) (Levenshtein, 1966), in               mOCRa includes a “very high” image quali-
a similar way it was used in other evaluations             ty mode to optimise the results. SnapIt does
                                                           not offer a similar option; moreover, due to
     1
         http://www.irlab.org/?q=publications/multimedia   the fact that the pictures can only be taken
                                                                            SnapIt              mOCRa
in landscape format, it is necessary to do two                Metric
                                                                        Nexus        HTC     Nexus      HTC
photographs.                                                  LD         1077        1072     1258      1309
                                                              NLD      0.22350   0.22711    0.26818   0.24640
                   SnapIt              mOCRa
     Metric
               Nexus        HTC     Nexus      HTC
     LD         2430        1830       54       923
                                                                   Table 3: Table text similarity
     NLD      0.34835   0.50925    0.98552   0.75248


      Table 1: Full page text similarity               which are dependant on the resolution of the
                                                       pictures. This does not happen with mOCRa,
   The second text contains 697 characters,            offering better results as the quality of the
uses a font size of 12pt, and it is written in         images improves.
English. This document contains the same
phrase repeated with different font types and          4.2.        Efficiency
formats (boldface, italic and underlined). The            These tests were run in a Nexus One de-
phrase does not make sense, but it is in-              vice. It accessed the web through a wireless
tended to cover most of the usual characters:          802.11g connection (54 Mb/s) to communi-
“The (quick) brown {fox} jumps! over the               cate with the server2 . The execution times
$3,456.78 <lazy> #90 dog & duck/goose, as              showed in tables 4 and 5 comprise the process
12.5 % of E-mail ”.                                    since a picture is sent for processing until the
   The results in table 2 show that mOCRa              text is shown in the mobile device.
again outperforms SnapIt. The configuration            4.2.1. mOCRa
used in mOCRa was the “high” quality one,
                                                          The application allows the users to choose
which is suitable to extract the text from sev-
                                                       the quality (size) of the picture to send. This
eral paragraphs.
                                                       choice can be done by adapting the size of
     Metric
                   SnapIt              mOCRa           the display grid, or by selecting one of the
               Nexus        HTC     Nexus      HTC
                                                       four available predefined quality levels. In the
     LD          419        212        50       113
     NLD      0.39885   0.69584    0.93084   0.84327
                                                       tests the data was collected for these prede-
                                                       fined levels. The computing time in mOCRa
    Table 2: Several fonts text similarity             includes:

    A complex table was choosen as the docu-            1. mobile - configuration sending,
ment for the last comparison. Each cell con-
                                                        2. mobile - image splitting,
tains several lines of text; the total amount of
characters is 1387. The pictures include the            3. mobile - creation and sending of Web
two upper rows of the table and the full width             Service packages containing the image,
(five columns), covering half of the page.
    From the results showed in table 3 it can           4. server - image reconstruction,
be inferred that nor mOCRa neither SnapIt               5. server - image pre-processing,
apply layout analysis techniques to deal with
tables or multi-column layouts. The results             6. server - OCR processing,
are pretty bad: the distribution of text caus-          7. server - text sending,
es problems in the OCR process and the lines
of the tables are extracted as extra charac-            8. mobile - text display.
ters. In this test mOCRa results include more
                                                       Also, it is worthy to mention that the commu-
errors than the SnapIt ones (mOCRa has a
                                                       nications involving images and text are en-
higher LD); however, its NLD value is bet-
                                                       crypted using SSL.
ter. The main implication of this fact is that
                                                          To obtain the results 10 pictures were pro-
mOCRa extracts more text from the noise of
                                                       cessed. Table 4 shows the time results: it can
the image than SnapIt, but it is more accu-
                                                       be observed that the use of the “very high”
rate obtaining the real text.
                                                       option is very time-consuming, but it takes
    The tests also show an interesting fact:
                                                       advantage of the maximum quality the cam-
SnapIt offers better results with the HTC
                                                       era can offer. As previously stated, mOCRa
Magic device, despite the fact that its camera
                                                       results improve with quality images.
quality is lower that the Nexus One camera.
Because of this, it is our belief that this appli-        2
                                                            Intel(R) Core(TM)2 Quad CPU Q6600 @
cation uses some image processing techniques           2.40GHz, 4 GB of ram
   About the time results, we must remark           bad-recognised words. With these improve-
that the process which most penalises the ex-       ments, the text could be used for complex
cution time of mOCRa is to send the image.          Information Retrieval tasks: the techniques
This operation takes longer than the image          of Parapar, Freire, and Barreiro (2009) can
pre-processing and retrieval of text.               be applied to use these texts in IR systems;
                                                    moreover, the work of K. Taghva and Condit
           mOCRa processing time (s)                (1994) states that if a picture is good enough
 Image quality   Mean      Best time   Worst time   and a post-processing of the text is applied,
       Low         6            5            8      the final result has the same quality as a text
     Medium        8            7            9      manually created and corrected.
       High        9            8           15         Finally, it is worthy to mention that we
     Very high    16,9         14           25      are working on image compression techniques
                                                    and in the improvement of the communi-
                                                    cation protocols to obtain better results in
       Table 4: mOCRa processing time.              terms of efficiency.

4.2.2. SnapIt                                       Bibliografı́a
    The tests run to check the efficiency of this   E. Borovikov, I. Zavorin and M. Turner.
application were the same used to check the            2004. A filter based post-OCR accura-
efficiency of mOCRa. We cannot provide any             cy boost system. In Proceedings of the
information about the way in which SnapIt              1st ACM workshop on Hardcopy document
retrieves the text (we have no access to its           processing, pages 23–28, Washington, DC,
source code), but we can assume that it also           (USA).
uses a server (since it cannot work without         K. Taghva, J. Borsack and A. Condit.
a connection to the Internet), so the process         1994. Results of applying probabilistic
should share some similarities.                       IR to OCR text. In Proceedings of the
    Table 5 shows the results. SnapIt times           17th annual international ACM SIGIR
improve the times of mOCRa; therefore, it             conference on Research and development
is our belief that this application does not          in information retrieval, pages 202–211,
take advantage of the quality of the cameras          Dublin, Ireland.
integrated in the mobile phones.
                                                    Levenshtein, V.I.   1966.    Binary Codes
                                                      Capable of Correcting Deletions, Inser-
         SnapIt processing time (s)
                                                      tions, and Reversals. Soviet Phys. Dokl.,
     Mean Time    Best Time         Worst Time        10(8):707–710.
        4,25             2,5            7
                                                    Parapar, Javier, Ana Freire, and Álvaro Bar-
                                                      reiro. 2009. Revisiting n-gram based
        Table 5: SnapIt processing time.              models for retrieval in degraded large col-
                                                      lections. In ECIR ’09: Proceedings of
                                                      the 31th European Conference on IR Re-
5.     Conclussions and Future Work                   search on Advances in Information Re-
                                                      trieval, pages 680–684, Berlin, Heidelberg.
    The application presented in this work,           Springer-Verlag.
mOCRa, shows good results. The system can
provide an excellent startpoint to build spe-
cific and complex systems, offering for in-
stance generation of summaries, generation of
snippets, entities detection or language trans-
lation.
    Our next works with mOCRa will in-
clude the improvement of the results by
the use of top-down layout detection tech-
niques, table boundaries detection techniques
and the use of text post-processing tech-
niques to detect the noise and to correct