mOCRa: Mobile OCR Application∗ mOCRa: Aplicación OCR Móvil Xose R. De La Puente Ismael Hasan IRLab, A Coruña Univ. IRLab, ICT Centre Campus Elviña s/n Campus Elviña s/n A Coruña A Coruña xose.puente.romay@udc.es ihasan@udc.es Resumen: En los últimos años, los teléfonos móviles han evolucionado hasta con- vertirse en dispositivos con cámaras de gran resolución y conexión a Internet. En este contexto, surge la idea de aplicar tecnologı́as OCR a las fotos de los móviles. Esta idea origina mOCRa, la aplicación presentada en este trabajo. Palabras clave: Reconocimiento óptico de caracteres, aplicación móvil, Android Abstract: In the last years, mobile phones have evolved to devices with high- resolution cameras and Internet connection. In this context, the idea of applying OCR techniques to the pictures taken with these devices rises. As result of this idea, we have built mOCRa, the application presented in this work. Keywords: Optical character recognition, mobile application, Android 1. Introduction The evolution of mobile devices has been vertiginous: two decades ago they were big devices, with a low autonomy of battery, and they only could be used to make phone calls; nowadays, mobile phones are devices with multimedia capabilities, Internet access, etc. Their features include the integration of the mobile phones with digital cameras. With this in mind, it rises the idea of using a mo- bile device to extract the text from the pic- tures taken with the camera; this idea is ma- terialised in mOCRa, and its first version for Android devices. 2. System Overview mOCRa client application offers to the users an accessible and easy-to-use interface, optimised capturing of images with text and tools to manage and edit the texts recovered from the pictures. The application interface includes an adaptable grid so the user can use it to align Figure 1: mOCRa working with a Nexus One it with the lines of text (see Figure 1); it device also allows to set the quality of the picture to be taken according to the amount of grid mOCRa: low, medium, high and very high. lines. In order to ease the use of the applica- The “very high” level is suitable to analyse tion, four quality levels have been defined in full page texts. The process of obtaining the text from a ∗ This work was funded by FEDER, Minis- terio de Ciencia e Innovación and Xunta de picture goes as follows: once an image is cap- Galicia under projects TIN2008-06566-C04-04 and tured with the phone it is sent to the server. 07SIN005206PR. The server processes it and returns the text to the phone; the user of the device can modify of OCR systems (E. Borovikov and Turner, the text using a text editor, can store it, can 2004), and the normalised Levenshtein dis- send it via e-mail or can select a portion of tance (NLD). text to be used as a query against Google. A Levenshtein distance is used to give an in- video demonstration of mOCRa can be found sight of the amount of differences between in the IRLab web1 . two sequences of characters. It measures the minimum number of operations needed to 3. System Architecture transform one of the sequences to the other. The application mOCRa was developed These operations include replacement, addi- for the Android platform. Also, it was de- tion and deletion of single characters. signed to be easily migrated to other mo- Normalised Levenshtein distance is used bile operating systems. To accomplish this, to give an insight about the similarity: a val- the application follows a client-server archi- ue of zero means that the sequences are com- tecture; moreover, to guarantee compatibil- pletely different, and a value of one means ity, communication issues are managed us- that they are equal. Its formula is ing Web Services. Both client and server sys- LD tems follow a component-based architecture, N LD = 1 − (1) LDM ax to build a functionally scalable application. The mobile device system comprises the where LDM ax is the length of the longest following modules: user interface module, text. Web Services based communication manage- The testbed includes an heterogeneous set ment module, stored texts management mod- of texts to cover several significant charac- ule and public key management module, to teristics to compare the applications. Snapit communicate with the server using secure does not allow to load images from the mem- connections. ory of the phone, it only retrieves the text The server system comprises the following from pictures taken with the camera, and it modules: image pre-processing module, OCR does not store them. For this reason, the im- analysis module, a parsing module for each ages used to compare the results are not ex- Web Service and a business logic module for actly the same for both systems. To minimise each Web Service. The aforementioned Web the impact of using different pictures, several Services are used to configure the applica- images were taken for each text and applica- tion, to send the images and responses and tion, and in the evaluation it was used the to avoid sending data in the case the server one offering the best results for each text. is overloaded. The applications were tested using two dif- ferent Android devices, with different speci- 4. Evaluation fications: HTC Magic and Nexus One. To do The application was evaluated in terms of the tests, three texts were used; a brief ex- effectiveness, to check that the results are cor- planation of each text follows, accompanied rect, and efficiency, to check that the results with the comparison of the results for each are obtained in an acceptable time. The ref- application and mobile phone. erence to be compared with was another mo- The first text used in the evaluation is an bile OCR application, SnapIt, available for economics text written in English. It is a full Android systems and developed by mocrsoft. page containing 3.729 characters, with a font This system has been commercialised for a size of 12pt. Snapit only allows to take pic- while, and it is one of the main competitors tures in landscape format, so it was necces- of mOCRa. sary to take two photographs to process the entire page. The “very high” quality option 4.1. Effectiveness was used in mOCRa. In this analysis the similarity between the Table 1 shows that the results of mOCRa text obtained using the applications and the clearly outperform the ones obtained with original text was measured using the Leven- SnapIt. To deal with full page images, shtein distance (LD) (Levenshtein, 1966), in mOCRa includes a “very high” image quali- a similar way it was used in other evaluations ty mode to optimise the results. SnapIt does not offer a similar option; moreover, due to 1 http://www.irlab.org/?q=publications/multimedia the fact that the pictures can only be taken SnapIt mOCRa in landscape format, it is necessary to do two Metric Nexus HTC Nexus HTC photographs. LD 1077 1072 1258 1309 NLD 0.22350 0.22711 0.26818 0.24640 SnapIt mOCRa Metric Nexus HTC Nexus HTC LD 2430 1830 54 923 Table 3: Table text similarity NLD 0.34835 0.50925 0.98552 0.75248 Table 1: Full page text similarity which are dependant on the resolution of the pictures. This does not happen with mOCRa, The second text contains 697 characters, offering better results as the quality of the uses a font size of 12pt, and it is written in images improves. English. This document contains the same phrase repeated with different font types and 4.2. Efficiency formats (boldface, italic and underlined). The These tests were run in a Nexus One de- phrase does not make sense, but it is in- vice. It accessed the web through a wireless tended to cover most of the usual characters: 802.11g connection (54 Mb/s) to communi- “The (quick) brown {fox} jumps! over the cate with the server2 . The execution times $3,456.78 #90 dog & duck/goose, as showed in tables 4 and 5 comprise the process 12.5 % of E-mail ”. since a picture is sent for processing until the The results in table 2 show that mOCRa text is shown in the mobile device. again outperforms SnapIt. The configuration 4.2.1. mOCRa used in mOCRa was the “high” quality one, The application allows the users to choose which is suitable to extract the text from sev- the quality (size) of the picture to send. This eral paragraphs. choice can be done by adapting the size of Metric SnapIt mOCRa the display grid, or by selecting one of the Nexus HTC Nexus HTC four available predefined quality levels. In the LD 419 212 50 113 NLD 0.39885 0.69584 0.93084 0.84327 tests the data was collected for these prede- fined levels. The computing time in mOCRa Table 2: Several fonts text similarity includes: A complex table was choosen as the docu- 1. mobile - configuration sending, ment for the last comparison. Each cell con- 2. mobile - image splitting, tains several lines of text; the total amount of characters is 1387. The pictures include the 3. mobile - creation and sending of Web two upper rows of the table and the full width Service packages containing the image, (five columns), covering half of the page. From the results showed in table 3 it can 4. server - image reconstruction, be inferred that nor mOCRa neither SnapIt 5. server - image pre-processing, apply layout analysis techniques to deal with tables or multi-column layouts. The results 6. server - OCR processing, are pretty bad: the distribution of text caus- 7. server - text sending, es problems in the OCR process and the lines of the tables are extracted as extra charac- 8. mobile - text display. ters. In this test mOCRa results include more Also, it is worthy to mention that the commu- errors than the SnapIt ones (mOCRa has a nications involving images and text are en- higher LD); however, its NLD value is bet- crypted using SSL. ter. The main implication of this fact is that To obtain the results 10 pictures were pro- mOCRa extracts more text from the noise of cessed. Table 4 shows the time results: it can the image than SnapIt, but it is more accu- be observed that the use of the “very high” rate obtaining the real text. option is very time-consuming, but it takes The tests also show an interesting fact: advantage of the maximum quality the cam- SnapIt offers better results with the HTC era can offer. As previously stated, mOCRa Magic device, despite the fact that its camera results improve with quality images. quality is lower that the Nexus One camera. Because of this, it is our belief that this appli- 2 Intel(R) Core(TM)2 Quad CPU Q6600 @ cation uses some image processing techniques 2.40GHz, 4 GB of ram About the time results, we must remark bad-recognised words. With these improve- that the process which most penalises the ex- ments, the text could be used for complex cution time of mOCRa is to send the image. Information Retrieval tasks: the techniques This operation takes longer than the image of Parapar, Freire, and Barreiro (2009) can pre-processing and retrieval of text. be applied to use these texts in IR systems; moreover, the work of K. Taghva and Condit mOCRa processing time (s) (1994) states that if a picture is good enough Image quality Mean Best time Worst time and a post-processing of the text is applied, Low 6 5 8 the final result has the same quality as a text Medium 8 7 9 manually created and corrected. High 9 8 15 Finally, it is worthy to mention that we Very high 16,9 14 25 are working on image compression techniques and in the improvement of the communi- cation protocols to obtain better results in Table 4: mOCRa processing time. terms of efficiency. 4.2.2. SnapIt Bibliografı́a The tests run to check the efficiency of this E. Borovikov, I. Zavorin and M. Turner. application were the same used to check the 2004. A filter based post-OCR accura- efficiency of mOCRa. We cannot provide any cy boost system. In Proceedings of the information about the way in which SnapIt 1st ACM workshop on Hardcopy document retrieves the text (we have no access to its processing, pages 23–28, Washington, DC, source code), but we can assume that it also (USA). uses a server (since it cannot work without K. Taghva, J. Borsack and A. Condit. a connection to the Internet), so the process 1994. Results of applying probabilistic should share some similarities. IR to OCR text. In Proceedings of the Table 5 shows the results. SnapIt times 17th annual international ACM SIGIR improve the times of mOCRa; therefore, it conference on Research and development is our belief that this application does not in information retrieval, pages 202–211, take advantage of the quality of the cameras Dublin, Ireland. integrated in the mobile phones. Levenshtein, V.I. 1966. Binary Codes Capable of Correcting Deletions, Inser- SnapIt processing time (s) tions, and Reversals. Soviet Phys. Dokl., Mean Time Best Time Worst Time 10(8):707–710. 4,25 2,5 7 Parapar, Javier, Ana Freire, and Álvaro Bar- reiro. 2009. Revisiting n-gram based Table 5: SnapIt processing time. models for retrieval in degraded large col- lections. In ECIR ’09: Proceedings of the 31th European Conference on IR Re- 5. Conclussions and Future Work search on Advances in Information Re- trieval, pages 680–684, Berlin, Heidelberg. The application presented in this work, Springer-Verlag. mOCRa, shows good results. The system can provide an excellent startpoint to build spe- cific and complex systems, offering for in- stance generation of summaries, generation of snippets, entities detection or language trans- lation. Our next works with mOCRa will in- clude the improvement of the results by the use of top-down layout detection tech- niques, table boundaries detection techniques and the use of text post-processing tech- niques to detect the noise and to correct