<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Geometrically and Temporally Consistent Visual Annotation for Smart Glasses</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kanade Sumino</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Naoya Wakita</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ikuhisa Mitsugami</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Hiroshima City University</institution>
          ,
          <addr-line>JAPAN 3-4-1, Ozuka-Higashi, Asaminami-ku, Hiroshima, 731-3194</addr-line>
          ,
          <country country="JP">JAPAN</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this study, we propose a wearable face recognition system using commercially available smart glasses. For this system, there are two technical contributions. First, we propose a geometric calibration between the display area on the user's visual ifeld and the camera mounted on the smart glasses for correctly overlaying the visual annotations on the physical world observed by the user's eyes. Secondly, we propose a method for reducing the delay in showing the visual annotation for maintaining geometric and temporal consistency. We developed the whole system and experimentally confirmed that the system could show geometrically and temporally correct annotations.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Wearable system</kwd>
        <kwd>face detection</kwd>
        <kwd>face recognition</kwd>
        <kwd>calibration</kwd>
        <kwd>multi-processing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Many of you should have experienced situations where
you could not recall the name or afiliation of a person
who you happened to meet though you knew his/her
face. It would be useful if there were a system that
superimposes the name and afiliation of the person at his/her
face in your view of sight. Such a system is also useful
in many situations such as helping elderly people with
dementia to recall people around them. In this study,
we thus propose a wearable face recognition system
using commercially available smart glasses. The system
performs face detection and recognition processing and
shows visual annotations on a transparent display in the
user’s field of view, which enables the user to know the
names and attributes of the people in front of you. There
are two technical challenges to realizing this system. The
ifrst challenge is that the annotations must be shown
appropriately at the face in the user’s field of view, for
which we propose a geometric calibration method for
the display in the field of view and a camera. The second
problem is that even the geometric calibration is done the
delay of the face detection and visual annotation causes
the misalignment of the annotation since the user’s face
and the person in front of him/her is always moving. To
solve this problem, we propose a multi-process
architecture where the face recognition and visual annotation
run as separate processes. This architecture significantly
reduces the delay and realizes the geometrically and
temporally correct visual annotation.</p>
    </sec>
    <sec id="sec-2">
      <title>2. System Configuration</title>
      <p>
        Augmented Reality (AR) is a technology that
superimposes digital information on the view of sight of a person
observing the real world to enable a visually augmented
representation of reality. In most existing studies, they
often use special devices such as Microsoft HoloLens or
video see-through VR goggles. Those devices are useful
as they ofer functions for maintaining the geometric
and temporal consistency between the annotations and
the real world. However, due to their weight and special
appearance, they are not suitable for us to wear in our
daily lives. For wearing in our daily lives, smart glasses
can be good alternatives. Though they usually do not
have functions for the geometric and temporal
consistency, they are small and lightweight and look like usual
glasses, which are important characteristics to use them
in reality. In this study, therefore, we use optically
transparent smart glasses EPSON Moverio BT-30E [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. It has
binocular transparent displays that are shown around the
center of the user’s field of view and a camera to capture
his/her field of view. Figure 1 shows an overview of the
system. The device is connected to a PC via USB, and
the displays and camera are recognized as an external
display and webcam, respectively.
show the visual annotation at the face in the user’s field
of view. Since the positions of the camera and the eyes
are not identical, it is necessary to geometrically calibrate
them in advance to realize the positionally correct visual
annotation. The following sections describe each step of
the proposed system.
3.1. Face detection
      </p>
      <sec id="sec-2-1">
        <title>For face detection, we apply a Haar-like feature-based</title>
        <p>
          face detector [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. When a face is detected within the
area corresponding with the display in the user’s field of
view, the face is cropped and saved in the system. When
multiple people are detected at the same time. the system
detects only the person closest to the center of the image.
        </p>
        <p>
          OpenFace [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] , which is one of the popular face
recognition libraries, is then applied to the cropped face images.
        </p>
        <p>OpenFace calculates similarities between the face image
and the images of people stored in a database and returns
a confidence level (the range of 0 to 1) for each person. In
our system, we experimentally determined the threshold
of the confidence level to 0.5.
wearing the system to gaze at the landmark while
matching each corner of the display to the landmark in his/her
sight while the camera captures the landmark, as shown
in Figure 3. We then integrate those four cases
corre3.2. Geometric Calibration for Visual sponding with the four corners as Figure 4. As shown in
Annotation this figure, this process gives four pairs of points on the
camera image and the display in the user’s field of view,
While the face detection is performed on the images so that it is possible to calculate the homography matrix
captured by the camera, the visual annotation for the between those image coordinates. Once the homography
detected face should be shown on the display in the user’s is obtained, the position of the face in the camera
coordiifeld of view. It is thus required to obtain the relation nate can be transformed into the display coordinate, so
between the camera and display coordinates, as shown that the visual annotation can be shown correctly at the
in Figure 2. face in the user’s field of view as shown in Figure 5.</p>
        <p>To obtain the relation, we propose a homography- Note that this homography-based calibration assumes
based calibration method. As a homography matrix is a 1) the four landmarks and the face to be recognized
3 × 3 matrix with a scale uncertainty, it has 8 unknowns, should be located on the same plane, or 2) the centers of
which means that four pairs of points are required to the eyes and camera should be colocated. Though they
calculate. It is thus often done that the display showing are not fulfilled, as the distance between the eyes and
(no fewer than) four points is captured by the camera to the camera is so small compared with the distance to
obtain the four pairs of points. In the case of this system, the face, the second assumption is reasonable. Besides,
however, it is impossible to capture the display by the considering the first assumption, it is desirable that the
camera. To solve this problem, we propose to obtain the landmark should be at the similar depth to the face to be
four pairs of points in an indirect way as follows. We first recognized in its actual use.
put a landmark point in the real world, and ask a user</p>
      </sec>
      <sec id="sec-2-2">
        <title>Even after the geometric calibration, it would still happen</title>
        <p>that the visual annotation is misaligned due to a delay by
the face recognition process. For example, as shown in
Fig. 6, even when the annotation is drawn at the position
where the face was detected, if the person in the real
world moves during the process, a gap occurs between Figure 7: Multi-process processing for reducing delays.
the annotation and the person in the user’s sight. In this
study, we thus propose a method for reducing the delay
by separating the whole process into that for the face ing and shows visual annotations on a transparent
disrecognition and that for calculating the position of the play in the user’s field of view, which enables the user
visual annotation, considering that the face recognition to know the names and attributes of the people in front
process takes much longer time than the others and the of you. The main contributions of this system are 1)
identity of the person in front of the user never changes the geometric calibration between the camera and the
so frequently while the face position changes frame by display in the user’s sight, and 2) the multi-processing
frame. Figure 7 shows the idea of the proposed method. for reducing the delay in showing the visual annotation.
By separating the face recognition process from the oth- We confirmed the efectiveness of those contributions by
ers, the facial annotation can be shown in the high frame actually implementing the system and performing the
rate and very little delay. experiments.</p>
        <p>Future work includes making the system smaller and
lighter for practical use. In the current system, the smart
4. Experiments glasses are connected to a desktop PC, but in practical
situations, they must be a wearable mobile PC or a
smartphone. Another important issue for practical use is to
consider the way to register people to be recognized.</p>
        <p>We experimentally evaluated the performance of the
proposed method. We implemented the system and asked
a participant (user) to keep looking at another person
who was moving in front of him. It was confirmed that
the visual annotation was always shown at the face in
the user’s sight even when the person is moving. Table
1 shows the efect of the proposed method. By applying
the multi-process method, the delay is reduced by 90%.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>5. Conclusion</title>
      <sec id="sec-3-1">
        <title>In this paper, we propose a wearable face recognition system using commercially available smart glasses. The system performs face detection and recognition process</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>[1] https://www.epson.jp/products/moverio/bt35e/</mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Viola</surname>
          </string-name>
          , M. Jones, “
          <article-title>Rapid object detection using a boosted cascade of simple features</article-title>
          ,
          <source>” Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition</source>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>Amos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ludwiczuk</surname>
          </string-name>
          , M. Satyanarayanan, “
          <article-title>Openface: A general-purpose face recognition library with mobile applications,”</article-title>
          <source>CMU-CS-16-118</source>
          , CMU School of Computer Science,
          <source>Tech. Rep.</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>