<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Assistive Mobile Application for the Blind</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ismail Sahak</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ong Huey Fang</string-name>
          <email>ong.hueyfang@monash.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Syuhada Abdul Rahman</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Computing, University Malaysia of Computer Science &amp; Engineering</institution>
          ,
          <country country="MY">Malaysia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Information Technology, Monash University Malaysia</institution>
          ,
          <country country="MY">Malaysia</country>
        </aff>
      </contrib-group>
      <fpage>21</fpage>
      <lpage>25</lpage>
      <abstract>
        <p>One of the challenges faced by blind people is the difficulty in identifying objects with concise information. They could only rely on the senses of hearing, smell, taste or touch to engage and get some perspectives of objects. Hence, this paper presents a mobile application called Iris to aid blind people in “visualising” their surroundings with descriptive objects. Iris combines the multiple object detection and optical character recognition capabilities of Microsoft Computer Vision API to turns smartphones into assistive devices for the blind to use in their daily activities.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        According to a study in 2010, visual impairment is a major
global health issue affecting 285 million people in six
World Health Organization regions. Approximately 39
million of them are blind, and 246 million with decreased
visual acuity
        <xref ref-type="bibr" rid="ref11">(Pascolini &amp; Mariotti, 2012)</xref>
        . Another study
reported that adults with visual impairment having difficulty
performing their daily tasks and need assistance in activities
such as reading, writing, shopping, driving and using the
computer
        <xref ref-type="bibr" rid="ref14">(Riddering, 2016)</xref>
        . Along with that, the use of
smartphones is prominent among the visually impaired
population. The advent of computer vision technologies, such as
object detection and optical character recognition (OCR), is
also promising for creating more effective mobile
applications to aid blind people dealing with problems of
identifying objects, texts, and spatial locations without having to
engage them
        <xref ref-type="bibr" rid="ref13">(Ramkishor &amp; Rajesh, 2017)</xref>
        .
      </p>
      <p>Among the issues of current mobile applications for blind
people is it could only detect a single object at a single
frame. In a real-life environment, there is a tendency for
objects to be close to each other. Therefore, it is crucial to
develop a mobile application that could detect not only one
but multiple objects in a single camera frame. Another issue
is that most applications (e.g. BlindTool and Aipoly)
respond without additional contexts or descriptions of objects
with their surrounding environment. For instance, if an
apple is detected, the application speaks out the word “apple”
to the user. Little does the user know, the apple may be on
top of a table. Moreover, printed text is everywhere in our
daily life, such as on reports, receipts, bank statements,
product packages and medicine bottles. It is troublesome for
blind people if they are unable to read these texts.
Therefore, this paper developed a mobile application called
Iris for the utility of blind people. The user can tap on the
screen to detect descriptive objects. A captured image is
sent to Microsoft Computer Vision API as a parameter to
retrieve values of detected objects and transcribed text in the
image. The mobile application then sorts the retrieved object
descriptions and text transcriptions based on confidence to
form a sentence. Finally, the application speaks out the
sentence to the user in a natural language.</p>
      <p>The rest of the paper is organised as follows. First, section 2
gives an overview and discusses some of the related works.
Section 3 provides an overview of Iris’s design and its main
components. Then, the proposed Iris’s design is presented in
section 4. Subsequently, section 5 concludes this paper.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Works</title>
      <p>
        In 1996, Malaysia’s National Eye survey found that among
18,027 residents examined, the age-adjusted prevalence of
blindness and low vision were 0.29% and 2.44%
respectively. Females had a higher age-adjusted prevalence of low
vision compared to males
        <xref ref-type="bibr" rid="ref15">(Zainal et al., 2002)</xref>
        . The authors
also highlighted that there is a need to evaluate the
accessibility and availability of eye care services and the barriers to
eye care utilisation in the country.
      </p>
      <p>
        With higher computational and storage capacity of mobile
devices, as well as growing speeds and coverage of mobile
internet, provide unique possibilities for the use of
smartphones as universal assistive devices
        <xref ref-type="bibr" rid="ref12">(Punchoojit &amp;
Hongwarittorrn, 2017)</xref>
        . The promising development in
computer vision, such as in optical character recognition
(OCR), makes it possible to create assistive devices with
camera-based products systems
        <xref ref-type="bibr" rid="ref5">(Dharmale, &amp; Ingole, 2015)</xref>
        .
Text is widely used in our daily life and an important form
of communication. For example, different signboards with
directions and shop names contain important textual or
symbolic information to facilitate human’s knowledge and
perception of the environment and in performing activities,
such as for navigation. The need to read textual or symbolic
information is essential in the case of blind or visually
challenged persons. Having the ability to determine what objects
precisely are in front of them, along with any additional
information is indeed helpful for the blind
        <xref ref-type="bibr" rid="ref3">(Brady, Morris,
Zhong, White, &amp; Bigham, 2013)</xref>
        .
      </p>
      <p>
        This study had benefitted from the use of existing
accessibilities technologies. People with visual impairment can
easily browse and navigate through their smartphones with
accessibility features to use the proposed mobile
application. Table 1 shows some of the accessibility features for
blind and low vision people, which are available in mobile
devices using iOS
        <xref ref-type="bibr" rid="ref1">(Apple, 2018)</xref>
        and Android
        <xref ref-type="bibr" rid="ref6">(Google,
2017)</xref>
        operating systems.
      </p>
      <p>iOS
VoiceOver
Speak Screen
Captioning &amp;
audio descriptions
Dark mode and smart
invert colours
Zoom and font adjustment
Magnifier
Accessibility Shortcuts
Dictation
Braille entry &amp; display</p>
      <p>Android
TalkBack
Select to Speak
Audio &amp; on-screen
text
Contrast and colour
options
Change display and
font size
Magnification
Interaction controls
Voice dictation</p>
      <p>
        BrailleBack
Object recognition brings forth a multitude of possibilities
in the modern world. This study had also implemented the
use of multiple object detection technology. An object
detection algorithm typically creates a bounding box around
the object of interest to locate it within the image. However,
the algorithm might not necessarily draw just one bounding
box in an object detection case, there could be many
bounding boxes representing different objects of interest within
the image, and it would not know how many beforehand
        <xref ref-type="bibr" rid="ref9">(Khurana &amp; Awasthi, 2013)</xref>
        .
      </p>
      <p>
        Faster R-CNN (Region Convolutional Neural Network) was
developed by researchers at Microsoft, which is based on
RCNN with a multi-phased approach to object detection.
RCNN used selective search to determine region proposals,
pushed these through a classification network and then used
a Support Vector Machine (SVM) to classify the different
regions
        <xref ref-type="bibr" rid="ref7">(Hulstaert, 2018)</xref>
        . However, a selective search is a
slow and time-consuming process affecting the performance
of the network. Therefore, Faster R-CNN algorithm
eliminates the selective search algorithm and lets the network
learn the region proposals
        <xref ref-type="bibr" rid="ref8">(Kawazoe et al., 2018)</xref>
        . This
object recognition technology is provided as an API service by
Microsoft, known as the Computer Vision API. One can use
Computer Vision in their application by using either a
native SDK or invoking the REST API directly
        <xref ref-type="bibr" rid="ref10">(Microsoft
Docs, 2020)</xref>
        .
Some of the existing mobile applications based on object
detection for the blind are such as BlindTool
        <xref ref-type="bibr" rid="ref4">(Cohen, 2015)</xref>
        and Aipoly Vision
        <xref ref-type="bibr" rid="ref2">(Aipoly, 2018)</xref>
        . Table 2 presents a
comparison of these mobile applications with the proposed. Iris
is seen to be better than others in terms of its output to the
users. Iris serves to give a better insight of the objects
detected by describing them. For instance, if in the captured
picture there is a mug on a table, Iris would describe it as
“Mug on a table” instead of just saying “Mug”. In addition
to that functionality, Iris can read any text that is present
with an object. If per se there are two drink cans, which
could be similar in dimension but different in brand, Iris
would come useful because it could tell the user what they
are holding, and what texts on the products using the OCR
functionality. Ultimately, the proposed mobile application
helps in giving blind people a much better perspective of
objects around them and is useful for their day-to-day
activities.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Requirement Analysis</title>
      <p>Requirement analysis is done to understand what will be
built, why it should be built, and in what order it should be
built. This section explains in detail about the needs of the
target users of this application, which are blind people.
Interviews were conducted with three voluntary respondents,
courtesy of Malaysian Association for the Blind in Kuala
Lumpur. Two of them are males, and one is female. They
are aged between 24 and 31 years old. The purpose of the
interviews was to understand the challenges that blind
people face in identifying objects, to get their perception
regarding mobile applications, and get their inputs on the
development of Iris. The following subsections discussed the
results of the interviews.
3.1</p>
    </sec>
    <sec id="sec-4">
      <title>Challenges in Identifying Objects</title>
      <p>Most of them have problems identifying new objects even
by touching them. It means that when a product that they
have been buying had changed the packaging, then they
would have difficulty to identify the product without
someone telling them what it is. Problems also arise when they
are unable to touch something, and they could not determine
what objects are present in their proximity, or even the
environment or location they are situated in. They would need to
depend on other senses, such as sound and smell, which
could be tricky if it is a new environment. The respondents
agreed that they prefer to know what environment they are
in without having to touch around.</p>
      <p>Besides, the respondents have trouble distinguishing two
physically similar objects. They are not able to tell one
object from the other when the objects have the same textures
or properties when touched. Hence, they need the ability to
differentiate objects. For example, they might want to
differentiate between a can of tomato soup from a can of
evaporated milk. This problem also relates to their inability to
read texts if they are not in braille.
3.2</p>
    </sec>
    <sec id="sec-5">
      <title>Perceptions of Using Mobile Applications</title>
      <p>All of the respondents have their personal smartphones.
However, their primary uses of a smartphone merely to text
and call. They use assistance features that come with their
phones to navigate and interact with the phone’s interfaces.
The respondents did not use any assistance applications with
their smartphones, but they surely welcome any applications
that could support their visual needs. One of the respondents
shared that they wanted a mobile application that could tell
if their hair or cloth is messy. Another respondent shared
that he wanted an application that would tell him if there are
things in front of him and to tell him the description of
certain items. The responded also added that he wanted an
application to manage his medications because it could be
confusing sometime.</p>
    </sec>
    <sec id="sec-6">
      <title>3.3 Features in Iris</title>
      <p>Due to usability reasons, all of the respondents agreed that
tapping the screen would be the best way to take a picture. It
presumably because it is easier to just tap anywhere on the
screen rather than having to locate a button. One key feature
suggested by a respondent was an auto-flash feature. This
feature proves to be useful to blind people because they
would not know if the scene is dark. Hence, having the
autoflash feature helps with the overall usability of the
application. Another key feature suggested was repeating the
application’s instructions. This feature was suggested to allow
repetition of instructions for new users and without having
to go through the process of retaking the picture. This part
of the interview had helped in designing the application to
be more user-friendly to blind people.
4
4.1</p>
    </sec>
    <sec id="sec-7">
      <title>The Proposed Mobile Application</title>
    </sec>
    <sec id="sec-8">
      <title>Architecture</title>
      <p>After uploading an image, Computer Vision API’s output
tags based on the objects, living beings, and actions
identified in the image. Tagging is not limited to the main subject,
such as a person in the foreground, but also includes the
setting (indoor or outdoor), furniture, tools, plants, animals,
accessories, gadgets and others. Computer Vision API’s
algorithms analyse the content in an image. This analysis
forms the foundation for a “descriptive” displayed in
complete sentences. Computer Vision API’s algorithms generate
various descriptions based on the objects identified in the
image. The descriptions are each evaluated and a confidence
score generated. An ordered list is then generated from the
highest confidence score to the lowest.</p>
      <p>
        OCR technology detects text content in an image and
extracts the identified text into a machine-readable character
stream
        <xref ref-type="bibr" rid="ref13">(Ramkishor &amp; Rajesh, 2017)</xref>
        . This technology can be
used for search and numerous other purposes like medical
records, security and banking. It automatically detects the
language. OCR supports 25 languages, and the accuracy of
text recognition depends on the quality of the image.
Inaccurate detections may be caused by blurry images,
handwritten or cursive text, artistic font styles, small text size,
complex backgrounds, shadows, or glare over text or perspective
distortion, oversized or missing capital letters at the
beginnings of words, subscript, superscript, or strikethrough text.
4.2
      </p>
    </sec>
    <sec id="sec-9">
      <title>User Interfaces</title>
      <p>is also spoken to the user. As previously discussed, in this
specific case, the API had not been able to return a result in
descriptions. Hence, the tags from the result’s JSON are
accessed and mapped; objects to their characteristics. It
makes it possible for the application to output a result even
when the Computer Vision API could not construct a
description from the picture.</p>
      <p>Figure 4 (a) shows an output when there are texts associated
with the object detected. It can be seen that the outputs from
two of the APIs used are combined within the application to
create an output that is intuitive to the user. While Figure 4
(b) shows an exception case when Iris is opened in a
poorlylit environment. The figure depicts that the light source is
coming from the smartphone’s flashlight. In this case, the
application detected that the ambience value in the
darkroom is too low. To counter that issue, Iris automatically
toggles the smartphone’s flashlight on. With this feature, it
is more accessible to blind people in the sense that they
could not tell if they are in a dark environment while using
Iris. Therefore, it is a feature that helps illuminate objects
while being in the dark.
In a nutshell, this paper adopted multiple object recognition
and OCR technologies to develop an assistive mobile
application for blind people. Iris helps to identify objects and
produce descriptive texts from a picture taken by a camera.
It serves to describe and distinguish objects so that the blind
can have a better insight into the objects in their
surroundings. The overall outcome was satisfactory, considering that
blind people can make use of the proposed mobile
application to tackle problems in their daily activities, hence, aiding
them towards independence. However, there are plenty of
features that could be implemented to improve the
application, such as lower latency and face detection. With 5G
technology a major availability in the future, the application
could access advanced algorithms and image processing
services in the cloud and retrieve the results almost
instantaneously. Moreover, the application could be updated to
detect and describe human faces, which will be a great support
in the communication of blind people with others.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Apple</surname>
          </string-name>
          (
          <year>2018</year>
          ).
          <source>Accessibility on iOS. Retrieved</source>
          from https://developer.apple.com/accessibility/ios/
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Aipoly</surname>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Aipoly Vision: Sight for Blind &amp;amp; Visually Impaired on the App Store</article-title>
          . Retrieved from https://itunes.apple.com/us/app/aipoly
          <article-title>-vision-sightforblind-visually-impaired/id1069166437?mt=8</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Brady</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morris</surname>
            ,
            <given-names>M. R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhong</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>White</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Bigham</surname>
            ,
            <given-names>J. P.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Visual challenges in the everyday lives of blind people</article-title>
          .
          <source>Paper presented at the Proceedings of the SIGCHI Conference on Human Factors in Computing Systems</source>
          , Paris, France.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Cohen</surname>
            ,
            <given-names>J. P.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>BlindTool - A mobile app that gives a “sense of vision” to the blind with deep learning</article-title>
          , https://github.com/ieee8023/blindtool
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Dharmale</surname>
            ,
            <given-names>R. D.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Ingole</surname>
            ,
            <given-names>D. P. V.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Text Detection and Recognition with Speech Output in Mobile Application for Assistance to Visually Challenged Person</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Google</surname>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Android accessibility overview - Android Accessibility Help</article-title>
          . Retrieved from https://support. google.com/accessibility/android/answer/6006564
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Hulstaert</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>A Beginner's Guide to Object Detection</article-title>
          . Retrieved from https://www.datacamp.com/commu nity/tutorials/object-detection-guide
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Kawazoe</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shimamoto</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yamaguchi</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , ShintaniDomoto,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Uozaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Fukayama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            , &amp;
            <surname>Ohe</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Faster R-CNN-based glomerular detection in multistained human whole slide images</article-title>
          .
          <source>Journal of Imaging</source>
          ,
          <volume>4</volume>
          (
          <issue>7</issue>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Khurana</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Awasthi</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Techniques for Object Recognition in Images and Multi-Object Detection</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Microsoft</given-names>
            <surname>Docs</surname>
          </string-name>
          (
          <year>2020</year>
          ) What is Computer Vision? - Computer Vision - Azure Cognitive Services. (
          <year>2020</year>
          ). Retrieved from https://docs.microsoft.com/en-us/azure/cog nitive-services/computer-vision/home
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Pascolini</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Mariotti</surname>
            ,
            <given-names>S. P.</given-names>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>Global estimates of visual impairment: 2010</article-title>
          .
          <source>British Journal of Ophthalmology</source>
          ,
          <volume>96</volume>
          (
          <issue>5</issue>
          ),
          <fpage>614</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Punchoojit</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Hongwarittorrn</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Usability Studies on Mobile User Interface Design Patterns: A Systematic Literature Review</article-title>
          .
          <source>Advances in HumanComputer Interaction</source>
          ,
          <year>2017</year>
          ,
          <volume>6787504</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Ramkishor</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Rajesh</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Artificial Vision for Blind Peoples using OCR Technology</article-title>
          .
          <source>International Journal of Emerging Trends &amp; Technology in Computer Science</source>
          ,
          <volume>6</volume>
          (
          <issue>3</issue>
          ),
          <fpage>30</fpage>
          -
          <lpage>33</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>Riddering</surname>
            ,
            <given-names>A. T.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Visual Impairment and Factors Associated with Difficulties with Daily Tasks (Doctoral dissertation</article-title>
          , Western Michigan University). Retrieved from https://scholarworks.wmich.edu/dissertations/2465
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Zainal</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ismail</surname>
            ,
            <given-names>S. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ropilah</surname>
            ,
            <given-names>A. R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elias</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arumugam</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alias</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , . . .
          <string-name>
            <surname>Goh</surname>
            ,
            <given-names>P. P.</given-names>
          </string-name>
          (
          <year>2002</year>
          ).
          <article-title>Prevalence of blindness and low vision in Malaysian population: results from the National Eye Survey 1996</article-title>
          .
          <source>The British journal of ophthalmology</source>
          ,
          <volume>86</volume>
          (
          <issue>9</issue>
          ),
          <fpage>951</fpage>
          -
          <lpage>956</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>