=Paper=
{{Paper
|id=Vol-2068/symcollab2
|storemode=property
|title=Selfie Guidance System in Good Head Postures
|pdfUrl=https://ceur-ws.org/Vol-2068/symcollab2.pdf
|volume=Vol-2068
|authors=Naihui Fang,Haoran Xie,Takeo Igarashi
|dblpUrl=https://dblp.org/rec/conf/iui/FangXI18
}}
==Selfie Guidance System in Good Head Postures==
<pdf width="1500px">https://ceur-ws.org/Vol-2068/symcollab2.pdf</pdf>
<pre>
               Selfie Guidance System in Good Head Postures
              Naihui Fang                                     Haoran Xie                           Takeo Igarashi
       Dept. of Computer Science                       Dept. of Computer Science              Dept. of Computer Science
         The University of Tokyo                        The University of Tokyo                The University of Tokyo
       fangnaihui88@hotmail.com                              xiehr@acm.org                         takeo@acm.org


ABSTRACT
Taking selfies has become a popular and pervasive activity
on smart mobile devices nowadays. However, it is still dif-
ficult for the average user to take a good selfie, which is a
time-consuming and tedious task on normal mobile devices,
especially for those who are not good at selfies. In order to
reduce the difficulty of taking good selfies, this work pro-
poses an interactive selfie application developed with multiple
user interfaces to improve user satisfaction when taking self-
ies. Our proposed system helps average users take selfies by           Figure 1: Our guidance system includes visual and voice user
providing visual and voice guidance interfaces on the proper           interfaces which guide the common user to achieve satisfying
head postures to achieve good selfies. Preprocessing through           selfie based on crowdsourcing results.
crowdsourcing-based learning is utilized to evaluate the score
space of possible head postures from hundreds of virtual self-
ies. For the interactive application, we adopt a geometric             With the development of computer-vision technology, es-
approach to estimate the current head posture of users. Our            pecially face recognition and facial-feature extraction ap-
user studies show that the proposed selfie user interface can          proaches, various selfie applications have come out to help the
help common users taking good selfies and improve user satis-          average user take better selfies on mobile platforms. For ex-
faction.                                                               ample, photo corrections and enhancements from a selfie can
                                                                       be provided in a post-process way to specific area of the faces
Author Keywords                                                        [7]. Recently researchers proposed an approach to enhance
Selfie; head posture; crowdsourcing; user interface;                   the selfies by suggesting expression [16] and face geometries
                                                                       [15] of the users in real-time. However, a user interface to
ACM Classification Keywords                                            guide common users to take good selfies is still absent in all
D.2.3. Design Tools and Techniques: User Interface                     these previous works.

INTRODUCTION                                                           We propose an interactive selfie user interface to give clear
The selfie is a type of self-portrait that usually consists of a       suggestions for users to take selfies based on crowdsourcing
photograph taken by mobile camera at an arm’s length. It is            results on mobile platforms. There are many factors in taking
one of the most direct ways to show oneself to others as well as       a good selfie, such as the light condition, background, user
to record one’s daily life for one’s own benefit. Selfies became       expressions, and so on. In contrast to a recent selfie system
more and more popular with the development of social media.            considering face position and light condition [10], we chose to
Now, it plays an important role in our daily life. Along with          study head posture (orientation), which is the most important
its popularity, how to take a good selfie has become an urgent         and controllable factor for selfie-taking in this work. Note that
issue to be solved. People want to look good on social media           the other factors share commonalities in the design of user
but many of them encounter trouble when taking selfies. They           interfaces and can therefore benefit from this work.
often spend long time and take a bunch of selfies trying to get        In this work, we first conducted crowdsourcing tasks to define
one good selfie that makes them look attractive and different          good head postures. In order to avoid factors other than head
from others. In particular, it is difficult to find a good head        posture, we generated many virtual selfies using 3D human
posture for a selfie. In addition, it’s not easy to have the same      models to receive the selfie score from crowd workers. The
head posture for the next one.                                         virtual selfie receives higher scores if the users considered it
                                                                       as more attractive. Based on the crowdsourcing results, we
                                                                       developed a real-time user interface to be able to extract the
                                                                       facial features from each frame and estimate the head postures
                                                                       of the users in the real world. A geometry model approach [6]
                                                                       was adopted to implement the real-time head-pose estimation
                                                                       with the facial feature information extracted from each frame.
©2018. Copyright for the individual papers remains with the authors.   The system can calculate the closest candidates to good head
Copying permitted for private and academic purposes.                   postures from crowdsourcing results, so that the system can
SymCollab ’18, March 11, 2018, Tokyo, Japan
instruct a user to achieve good selfie under the guidance of a
reference human head. This work is expected to reduce the dif-
ficulty of taking good selfies, thus improving user satisfaction
while taking selfies.
The main contributions of this work include: We proposed
a novel approach to support taking selfies by providing real-
time suggestions on head postures; we collected data of head
postures in selfie-taking from crowdsourcing results obtained
by using hundreds of virtual selfies from 3D models. We
implemented multiple ways of interaction between the user
and the proposed interfaces as illustrated in Figure 1.

RELATED WORK
Selfie Systems. Creamcam [11] and YouCam [4] to the facial
features or add makeup effect onto the human face based on
                                                                    Figure 2: The framework of our selfie guidance system in-
facial feature extraction algorithms. Smile shutter function of
                                                                    cludes offline crowdsourcing tasks and online selfie guidance
Sony DSC T300 provides the support on facial expressions.
                                                                    system.
An interactive selfie application was proposed to suggest face
size, face position and lighting direction [10]. In contrast to
these works, our system focuses on this important selfie factor,    for the user. To communicate this selfie suggestion, our system
head postures, in real time.                                        shows a visual guide in the user interface to direct the motion
Photo Editing. An interactive photomontage system was pro-          of the user’s head to achieve the ideal head posture. Voice
posed to create a composite set of images for the same objects,     instructions are also proposed to help the user to move their
resulting in a single image for subsequent enhancement [1].         head using the proposed system. After the user achieves the
A similar approach was proposed to obtain good flash images         ideal head posture, the proposed system provides a voice noti-
using a pair of flash and non-flash images [2].                     fication and takes the selfies automatically. This helps the user
                                                                    to take a good selfie more easily, thus reducing the difficulty
Aesthetics Feedback. Aesthetics feedback was given for pho-         of taking good selfies and enhancing user satisfaction.
tos taken by mobile users from compositions, color combi-
nations and the aesthetic rating [14]. An aesthetic evaluation      Crowdsourcing Tasks for Head Posture
approach utilized a peer-rated online photo-sharing website as
                                                                    We designed two crowdsourcing tasks to explore good head
the input datasets based on machine learning [5].
                                                                    postures on CrowdFlower. In the first task, the participants
Assistive Technology. There are other researches that consid-       were asked to score each selfie, and the average score of each
ering about the assistive technology to support the interaction     selfie was used to represent its attractiveness. In the first task,
between users and user interface. Voykinska et al. studied          we ask the crwodworkers to score each selfie according to its
about the way that blind people use to interact with social         attractiveness from 1 to 10 where 10 is stronly attractive while
network [13]. Researchers also provided approach to ask re-         1 represents not attractive at all. For each selfie, 5 participants
mote paid workers to help the blind people with some camera         are asked to score it and we adopt the average score of each
devices and the internet [3]. Similar research asked the crowd      selfie to represent its attractiveness. The second task is to ask
to answer the visual questions to help blind people [9].            users to evaluate the attractiveness of the selfie with different
                                                                    distances from camera. We adopted the 3D virtual model
FRAMEWORK                                                           instead of real selfies for evaluation due to the accurate control
The framework of this work consists of both offline and online      of head posture that is possible with virtual selfies; meanwhile,
parts as shown in Figure 2. For the offline part, we explored       it is difficult to obtain real selfies with exact head postures. The
good head postures by crowdsourcing results to define good          usage of virtual selfies can reduce other factors influencing
head posture. We generated hundreds of virtual selfies with         the selfies in order to specifically evaluate the attractiveness of
different head postures (two types of virtual selfies used in       selfies. To construct a relatively comprehensive selfie database
this paper are created based on the 3D models from Unity3D).        that includes various head postures, we created a database
Among these virtual selfies, we defined “good” head posture         using virtual selfies, which was comprised of 486 different
as selfies which got higher than 7. For best selfie position, we    head postures with pitch angle ranging from (-40, 40), roll
generated virtual selfies with different distances to the camera    angle ranging from (-40, 40) and yaw angle ranging from (-30,
to evaluate the attractiveness and chose the distance with the      20). We sample virtual selfies in every 10 degrees for each
highest score.                                                      dimension.
For the online part, we extracted the user’s facial features for    The results of the crowdsourcing task are shown in the right
each frame and generated a 3D facial model to calculate the         of Figure 3. The green points represent the head postures that
head posture of users using a geometric approach and in real        were rated higher than 7, the red points represent the head
time. Our system calculates the closest good head posture to        postures rated higher than 5 but lower than 7 and the grey
the user’s current head posture and suggests it as the ideal pose   points represent the head postures rated lower than 5. In the
                   Figure 3: Virtual selfies obtained good, medium, and bad scores from crowdsourcing results.


first crowdsourcing task, among 486 selfies, the 58 with the
highest score were defined as good head orientations. We
described the head postures that got scores higher than 7 as
“good”, 5-7 as medium and under 5 as bad. The proposed
selfie guidance system aims to take “good” selfies; thus, we
chose the “good” head posture to set the system. We show the
selected samples of the selfies which got good, medium and
bad scores in Figure 3. From these results we found that the
selfies with the front face achieved the highest scores from the
crowdsourcing results. However, the common users would not
be satisfied with single head pose, and more choices of head
poses are preferred.
In the second crowdsourcing task, we generated 6 virtual             Figure 4: Interpolation of the results of crowdsourcing task.
selfies with different distances considering the length of the
arm of human. We utilized the selfie scored as the highest to
define the best distances. We accepted a larger tolerance for        Facial Feature Extraction Based on Face Landmark
the distances for two reasons: first, we calculated the distance     In this work, facial feature extraction plays an important role
roughly by calculating the area of the triangle constructed          in the estimation of the head posture of the user. Many re-
by the far corners of the eyes and nose (this may be affected        searchers have made efforts to solve this problem in order to
by different head postures, possibly rendering it inaccurate);       attain a high accuracy in the recognition of human faces and
from results of this task, we know that the distance doesn’t         facial features. Also in this work, we extracted eyes, mouth
affect the selfie as the head posture (selfies in a quite large      and nose feature points based on dlib [8] which uses 68 points
range get relatively similar marks).                                 to describe human faces. This algorithm is a feature-based
                                                                     algorithm based on the HOG combined with a linear classifier
To obtain continuous assessment for head postures, we                and an image pyramid, and sliding window detection scheme.
interpolated the crowdsourcing results to compute a score
function for the head postures using trilinear interpolation.
Figure 4 illustrates the results of the interpolation (We            Head Posture Estimation Based on Geometry Model
display it using several slices). The multi-colored progress         Head pose estimation is the key part in this work that can
bar represents the distribution of color from high score to          estimate the head orientation of the user based on the facial
low score of the head postures; the red region shows the             features we extracted. The head-movement suggestions pro-
high-scoring head postures, and the blue region shows the            vided in the user interface are all based on the estimation
head postures that get relatively low scores. As a result, both      results. In this work, we estimate the head posture using a
datasets contain large portions of relatively good selfies as        geometry model [12]. To introduce how we estimate the head
scored by the score function from the crowdsourcing results.         posture, we will first introduce how we describe the head pos-
Since both datasets are comprised of selfies or photographs          ture. Generally, we define it as the position and the orientation
that are considered relatively attractive, the experimental          of the head. However, the position of the head can be cal-
results verified that a good head posture itself has a significant   culated directly from the image; thus, we focus on how to
impact on the aesthetic of a selfie.                                 calculate the orientation from a single image (frame). Head
                                                                     orientation can be described by roll, yaw and pitch angles. The
                                                                     three axes are orthogonal, so that we can calculate each angle
                                                                     respectively. The proposed geometry model assumes that the
                                                                          (I − ddT ).
                                                                                                              p
                                                                                      R2 − v1 − 2v2 R2n +       (v1 − R2n )2 + 4v1 v2 R2n
                                                                            z2normal = n                                                    (2)
                                                                                                          2(1 − v2 )R2n
                                                                                         2
                                                                                         ln                   y2       z2
                                                                          Here v1 =      lf
                                                                                                                     normal
                                                                                               and v2 = (1−y2normal)(1−z 2             .
                                                                                                              normal        normal )

                                                                          Roll angle of the head can be simply calculated by the differ-
                                                                          ences between the left eye and the right eye. Here we obtain
                                                                          the left eye and right eye coordinates by the facial feature ex-
                                                                          traction algorithm. We use a arctan() function to turn the ratio
                                                                          of the differences of the coordinates to degrees to measure the
                                                                          roll angle.
                                                                                                         y −y 
                                                                                                           l     r
                                                                                             θr = arctan                               (3)
     Figure 5: Experimental results run on smart phone.                                                   xl − xr
                                                                          Calculating yaw angle and pitch angle involves the normal we
                                                                          gained before and the x, y, z represents each component of
head is bilateral symmetry and the two far corner of eyes and
                                                                                                             p in Eq. (1) (x = sin σ cos τ,
                                                                          facial normal vector which is defined
the two mouth corners form a plane and also define a facial               y = sin σ sin τ, z = − cos σ , r = x2 + y2 + z2 ).
normal to show the direction of the nose. This facial normal
shows the direction of the head. In contrast to other head-pose                                px 2 + y 2                z
estimation algorithms (tracking approach, etc.), the series data                  θy = arccos               , θ p = arccos             (4)
                                                                                                    r                       r
of previous frames are not required by the proposed geom-
etry approach to obtain the head pose. Another approaches                 These three angles describe the head orientation of the end
is direct landmark fitting algorithm which is quite similar to            user in the real world. Using this head-orientation information
our approach. However, by establishing a facial model, our                and the crowdsourcing results, the system can provide further
approach enables relatively simple way to calculate each angle            suggestions for head-movement to the user in real time.
for the head pose which is enough to proof the concept in our
                                                                          USER INTERFACE
research.
                                                                          We implemented the selfie guidance system on the iOS 9.2
Facial Model Conduction                                                   using Swift and Objective-C to instruct the user to achieve the
We define L f to be the distance from the far corner of eyes              head posture that was defined to be good by the crowdsourc-
to the corner of the mouth in the facial model, Ln to be the              ing tasks. The system applied the face-landmark algorithm to
distance from the nose tip to the nose base, and Lm to be the             process the camera frame in question. We adopted the eyes,
distance from nose base to the mouth. Le denotes the distances            nose and mouth points to estimate the head posture. For the
between two far corner of the eyes. Notice that l f , ln , lm , le are    first frame, the system extracts the facial features to initialize
the projected values. We calculate the following three ratios             the face-geometry model. It then calculates the head posture
to define the facial model: Rm , and Rn . Here, Rm is calculated          with the facial features from each frame and the face-geometry
by Lm /L f and Rn is Ln /L f . Because all these ratios are quite         model. The whole system is described in Algorithm 1. The
stable with respect to different faces, constant values can be            user interface of our selfie guidance system consists of a vi-
adopted to replace the ratios unless a very high accuracy of              sual user interface and voice instruction. We combined these
the orientation is required. For the typical face, Rm = 0.6,              two methods for interacting with the user to provide new user
Rn = 0.4. The Rm and Rn are used to calculate the slant as                experiences and reduce the difficulty for users in understand-
shown in the next section, and the more detailed description              ing the instructions when taking selfies. For pilot study, we
can be found in [6].                                                      implemented two selfie guidance systems: only visual user
                                                                          interface; both visual and voice user interfaces. From the user
Estimating of the User’s Head Posture                                     feedback, the participants reported that it is interesting and
We represent a head posture with τ, which is the angle between            convenient to following the voice instructions.
the symmetry axis and the facial normal (the projected one in
the 2D image) and a slant σ , which is the angle between the              Visual User Interface
normal of the image and the facial normal. The facial normal              The visual user interface is implemented as shown in Figure 6.
can be calculated as Eq. (1).                                             To reduce the operational difficulty of head movement, the user
                                                                          interface provides suggestions for roll, yaw and pitch angles
            n̂ = [sin σ cos τ, sin σ sin τ, − cos σ ]              (1)    separately. Thus, the user can move their head one direction
                                                                          at a time. With the combination of these head movements,
Slant σ can be calculated by the normal of the image d =                  the user can easily achieve good head posture. The arrows
[xnormal , ynormal , znormal ] as cosσ =| z2normal | (znormal is a nor-   shown in the top and bottom of the user interface suggest the
malized value). znormal can be defined using two values v1 and            pitch angle and ask users to adjust their head by tilting their
v2 which is shown in the Eq. (2) and the projection matrix                head up or down slightly. The arrows in the right and left
    Algorithm 1: Real-time processing for taking a selfie                left slightly!” and “turn right slightly!” for the yaw angle,
                                                                         and these suggestions correspond to the arrows on the visual
    input :Camera frame including a human face.
                                                                         user interface. When the user has the right posture, and is
    output :A good selfie stored in the album.
                                                                         ready to take selfies, the system will display “Perfect! Let’s
1 Define good head postures by crowdsourcing approach using              take a photo, 3, 2, 1, cheese!” This notifies the user when to
   486 virtual selfies with different head postures. This results        take the selfie and provides extra time for them to adjust their
   in 58 good head posture sets.                                         expressions.
2 Extract the eyes, nose and mouth points from the frame based
   on the face landmark algorithm which uses HOG feature                 USER STUDY
   combining with a linear classifier to estimate a vector S (S          We designed a comparison user study in order to evaluate the
   contains the coordinates information of each facial point).           validity of the developed selfie-supporting system. (We also
3 Use the facial features extracted from the frame to establish
                                                                         designed pilot study to compare the system that only have
   geometry facial model F.                                              visual instructions with the system that have visual and voice
4 Based on F and the facial feature information to estimate the
                                                                         instructions). Of the 8 participants, 2 are female and others
   head posture Pc from the current frame which is the current           are male. The ages of all the participants are between 22 and
   head posture of the user.                                             28. Some of them spend a great deal of time taking selfies
5 Calculate the Euclidean distances between P and each good
                                                                         while others take very few or no selfies. We mainly asked
   head posture Pg (defined by crowdsourcing task) to find a             them to use our developed system and a normal camera to
   minimum value which is defined as “the closest good head              experience the difference and asked them to fill out a ques-
   posture” Pcg (notice here the system calculates the good              tionnaire to evaluate the system with regards to several aspects.
   head posture once until the frame count fc to be reset).
6 Compare each angle defined in Pcg and those of Pc to provide
                                                                         There were two tasks, and the participants were asked
   different suggestions                                                 to complete the two tasks in a random order. Task 1 asked
7 Go back to step 1 if the Pc doesn’t equal to the Pcg (with a
                                                                         the participants to take several selfies using a normal camera
   tolerance ∆d).                                                        without any suggestions during the process. We asked them to
8 Terminate the loop when Pc is close enough to the Pcg (with
                                                                         choose 5 selfies that they were satisfied with. They could take
   ∆d); take photo and store to the album and reset the frame            as many selfies as they wanted until they got 5 satisfactory
   count fc .                                                            selfies. Task 2 asked the participants to use the developed
                                                                         selfie-supporting system to take 5 or 6 selfies. (The reason
                                                                         some of the users take more than 5 selfies is that we were
    side of the screen instruct users to turn their head right or left   trying to avoiding multiple selfies with the same head posture).
    slightly. The semicircle arrows instruct the user to tilt his or     In this task, they followed the suggestions about their head
    her head right or left. These arrows will keep showing in the        posture and also heard voice instructions while taking selfies.
    user interface until the user’s head achieves the right position.
                                                                         In the experiment, 4 of the participants were asked to
                                                                         complete Task 1 first and then complete the Task 2 while
                                                                         the other 4 users were asked to complete Task 2 first and
                                                                         then Task 1. In addition to this comparison user study, we
                                                                         also asked them to use our selfie-supporting system with the
                                                                         tripod settings to experience taking selfies with the visual- and
                                                                         voice-instruction user interface while at a distance further than
                                                                         human arms can reach. After all the tasks above, they were
                                                                         also asked to fill a questionnaire to evaluate their satisfaction
                                                                         with the system, the user interface and also the attractiveness
                                                                         of the head postures provided by this system.
    Figure 6: Visual user interface of the developed selfie guidance
    system.                                                              RESULTS
                                                                         We applied the score function according to the crowdsourcing
                                                                         results to the result of the user study. Figure 7 and Figure 8
    Voice User Interface                                                 illustrate examples in which the user used a normal camera
    In the voice instruction part, the selfie-supporting system pro-     to take selfies using our proposed system. From the result we
    vides voice instructions to help the user to understand how to       can see that the selfies that taken by our system get higher
    move their head to match the good head posture. There are            scores on average. For example, for the second user, the
    four types of voice instructions that indicate suggestions for       average score of selfies taken by a normal camera was 6.94;
    proper head orientation and distance, and that check for the         meanwhile, the selfies taken by our system were scored 7.7 on
    presence of a smile. For the head orientation suggestions, the       average. Here we divided the user into two groups according
    system displays “tilt head right slightly!” and “tilt head left      to whether they were good at taking selfies or not. Figure 7
    slightly!” to suggest the roll angle; “tilt head up slightly!” and   shows the examples of users who were not good at taking
    “tilt head down slightly!” to suggest the pitch angle; and “turn     selfies. We can see from this result that the proposed system
helps the users take better selfies overall. In these examples,    of users think that our system is more convenient and 25% of
the average score of selfies taken by our system increases         users think that the both systems are equivalent. Figure 10
around 30% compared to the selfies taken by normal camera.         (b) illustrates what users think of the head postures provided
Figure 8 shows the examples of users who exhibited relatively      in our system. The results show that 87.5% of users agree
                                                                   that our selfie supporting system provide more choices for
                                                                   them when taking selfies (25% of users choose strongly agree;
                                                                   62.5% of users choose agree); 12.5% of users neither agree
                                                                   nor disagree.
                                                                   Figure 9 (a) shows the user feedback concerning the devel-
                                                                   oped selfie-support system. We asked users about whether the
                                                                   application helped them take better selfies and if the function
                                                                   suggesting head posture was needed, in their opinion. We also
                                                                   asked about whether they thought it was easier to take good
                                                                   selfies with the supporting system. The chart shows the aver-
                                                                   age score of each question and we can see that the users have
                                                                   quite high satisfaction with the system. Figure 9 (b) shows the
                                                                   average scores for evaluations about the head posture, which
                                                                   show that users confirm that the head postures provided from
                                                                   the application are good. Figure 9 (c) shows the average scores
Figure 7: Scoring results of user’s selfies taken by normal        for evaluations about the visual user interface. In this part we
camera and our system for those who are not good at taking         asked users whether they could understand the meaning of the
selfie.                                                            arrows and if they could realize which arrows suggested roll,
                                                                   yaw and pitch angle, respectively. From these results we find
better skill in taking selfies. They got higher scores even        that the users can understand the meaning and how to follow
using a normal camera. However, our system still helped them       the suggestions well.
get higher score on average. The proposed system can also          Figure 11 shows the evaluations of the voice instructions. To
support them by providing more head postures so as to give         evaluate whether the voice instructions work when the user is
them options for different appearances.                            taking selfies, we asked the user about two aspects: whether
                                                                   they were aware of the voice instruction and whether they
                                                                   understood the meaning of the voice instructions. We also
                                                                   asked the users whether the voice instructions were interesting
                                                                   (Q24), whether they were helpful (Q25) and whether they
                                                                   were easy to understand (Q26). According to the results, users
                                                                   were able to notice and understand the voice instruction well.

                                                                   CONCLUSION
                                                                   In this work, we proposed a real-time selfie support system
                                                                   which provides suggestions for the head posture of the user.
                                                                   This proposed system aimed to help user to take good selfies.
                                                                   Two crowdsourcing tasks were conducted to explore good
                                                                   head orientations and the best distance from the camera (posi-
                                                                   tion). In each crowdsourcing task, we generated virtual selfies
                                                                   to create a database and asked crowdworkers to score each
                                                                   virtual selfie by its attractiveness. After the crowdsourcing
Figure 8: Scoring results of user’s selfies taken by normal        task, we developed a selfie system which provides real-time
camera and our system for those who are good at taking selfie.     suggestions for head postures. We implemented both the vi-
Note that there are more options of head postures by using our     sual user interface, in which arrows suggest movements of
proposed system than normal camera.                                the head as well as the voice instructions, which also give
                                                                   guidance to the users on how to move their head and in which
                                                                   direction. A user study was conducted where participants were
The questions in our questionnaire queried the user regarding
                                                                   asked to take selfies using two cameras (a normal camera and
system satisfaction, feedback on head posture provided by
                                                                   the camera with our head-posture support system). According
the system, the visual user interface and the voice instruction.
                                                                   to the study results, the proposed system can help users to take
For these statements, users were asked to score the statement      good selfies and improve user satisfaction.
from 1 to 5, where 1 represents “strongly disagree” and the 5
represents “strongly agree.” The evaluation and analysis of the    In this work, we explored good head posture using a 3D virtual
system have been done according to the result of the user study.   model and applied the crowdsourcing results to all the users.
Figure 10 (a) shows the results of the comparison of the normal    The aesthetic evaluation may vary due to different cultural
camera and our system. As we can see in the Figure 10, 75%         backgrounds or knowledge differences, so this could be further
Figure 9: The average score of the user feedback for the application (a); the average score of user feedback for the head postures
provided by the application (b); the average score of user feedback for the visual user interface (c).


                                                                       REFERENCES
                                                                       1. Aseem Agarwala, Mira Dontcheva, Maneesh Agrawala,
                                                                          Steven Drucker, Alex Colburn, Brian Curless, David
                                                                          Salesin, and Michael Cohen. 2004. Interactive Digital
                                                                          Photomontage. ACM Trans. Graph. 23, 3 (Aug. 2004),
                                                                          294–302. DOI:
                                                                         http://dx.doi.org/10.1145/1015706.1015718

                                                                       2. Amit Agrawal, Ramesh Raskar, Shree K. Nayar, and
                                                                          Yuanzhen Li. 2005. Removing Photography Artifacts
                                                                          Using Gradient Projection and Flash-exposure Sampling.
                                                                          ACM Trans. Graph. 24, 3 (July 2005), 828–835. DOI:
Figure 10: The user feedback for the comparison of the normal            http://dx.doi.org/10.1145/1073204.1073269
camera and the developed selfie supporting system (a); the
user feedback for whether they have more choice of head                3. Jeffrey P. Bigham, Chandrika Jayant, Hanjie Ji, Greg
postures with our selfie supporting system (b).                           Little, Andrew Miller, Robert C. Miller, Aubrey
                                                                          Tatarowicz, Brandyn White, Samuel White, and Tom Yeh.
                                                                          2010. VizWiz: Nearly Real-time Answers to Visual
studied by considering more parameters for certain users. The             Questions. In Proceedings of the 2010 International
integration of other selfie factors is not difficult to implement in      Cross Disciplinary Conference on Web Accessibility
our selfie guidance system. In the future, it is worth exploring          (W4A) (W4A ’10). ACM, New York, NY, USA, Article
the relationship between the substantial amounts of selfie data           24, 2 pages. DOI:
and methods for subjective crowdsourcing evaluations using               http://dx.doi.org/10.1145/1805986.1806020
deep learning to achieve satisfying selfies. An alternative way
                                                                       4. Perfect Corp. 2015. YouCam Perfect. (2015).
to evaluate the quality of selfie can be explored by example-
                                                                         http://www.perfectcorp.com/stat/product/CyberLink_app/
based approach to compare the reviews from social network
                                                                         Perfect_Corp/enu/design/perfectCorp/index.jsp
or ask crowdworkers to score the selfies taken by our system
and other work with functionality. The crowdsourcing-based             5. Ritendra Datta, Dhiraj Joshi, Jia Li, and James Z. Wang.
evaluation approach can also be applied to other areas such as            2006. Studying Aesthetics in Photographic Images Using
evaluating the accuracy of face recognition.                              a Computational Approach. In Proceedings of the 9th
                                                                          European Conference on Computer Vision - Volume Part
ACKNOWLEDGMENTS                                                           III (ECCV’06). Springer-Verlag, Berlin, Heidelberg,
We thank all participants for joining our case studies, and               288–301. DOI:http://dx.doi.org/10.1007/11744078_23
Morph 3D for sharing the head model in Figure 3. This work
                                                                       6. Andrew Gee and Roberto Cipolla. 1994. Determining the
was supported by JST CREST Grant Number JPMJCR17A1
                                                                          gaze of faces in images. Image and Vision Computing 12,
and JSPS KAKENHI Grant Number JP17H06574, Japan. Hao-
                                                                          10 (1994), 639 – 647. DOI:
ran Xie is funded by Epson International Foundation.
                                                                         http://dx.doi.org/10.1016/0262-8856(94)90039-6
                                  Figure 11: The user feedback about the voice instructions.


 7. Neel Joshi, Wojciech Matusik, Edward H. Adelson, and          12. Baback Moghaddam and Alexander P. Pentland. 1994.
    David J. Kriegman. 2010. Personal Photo Enhancement               Face recognition using view-based and modular
    Using Example Images. ACM Trans. Graph. 29, 2,                    eigenspaces. Proc. SPIE 2277 (1994), 12–21.
    Article 12 (April 2010), 15 pages. DOI:
                                                                  13. Violeta Voykinska, Shiri Azenkot, Shaomei Wu, and Gilly
    http://dx.doi.org/10.1145/1731047.1731050
                                                                      Leshed. 2016. How Blind People Interact with Visual
 8. V. Kazemi and J. Sullivan. 2014. One millisecond face             Content on Social Networking Services. In Proceedings
    alignment with an ensemble of regression trees. In 2014           of the 19th ACM Conference on Computer-Supported
    IEEE Conference on Computer Vision and Pattern                    Cooperative Work & Social Computing (CSCW ’16).
    Recognition. 1867–1874. DOI:                                      ACM, New York, NY, USA, 1584–1595. DOI:
    http://dx.doi.org/10.1109/CVPR.2014.241                           http://dx.doi.org/10.1145/2818048.2820013

 9. Walter S. Lasecki, Phyo Thiha, Yu Zhong, Erin Brady,          14. Lei Yao, Poonam Suryanarayan, Mu Qiao, James Z.
    and Jeffrey P. Bigham. 2013. Answering Visual                     Wang, and Jia Li. 2012. OSCAR: On-Site Composition
    Questions with Conversational Crowd Assistants. In                and Aesthetics Feedback through Exemplars for
    Proceedings of the 15th International ACM SIGACCESS               Photographers. International Journal of Computer Vision
    Conference on Computers and Accessibility (ASSETS                 96, 3 (2012), 353–383.
   ’13). ACM, New York, NY, USA, Article 18, 8 pages.
                                                                  15. Mei-Chen Yeh and Hsiao-Wei Lin. 2014. Virtual
    DOI:http://dx.doi.org/10.1145/2513383.2517033
                                                                      Portraitist: Aesthetic Evaluation of Selfies Based on
10. Qifan Li and Daniel Vogel. 2017. Guided Selfies Using             Angle. In Proceedings of the 22Nd ACM International
    Models of Portrait Aesthetics. In Proceedings of the 2017         Conference on Multimedia (MM ’14). ACM, New York,
    Conference on Designing Interactive Systems (DIS ’17).            NY, USA, 221–224. DOI:
    ACM, New York, NY, USA, 179–190. DOI:                             http://dx.doi.org/10.1145/2647868.2656401
    http://dx.doi.org/10.1145/3064663.3064700
                                                                  16. Jun-Yan Zhu, Aseem Agarwala, Alexei A Efros, Eli
11. LoftLab. 2016. CreamCam Selfie Filter. (2016).                    Shechtman, and Jue Wang. 2014. Mirror Mirror:
    http://creamcamapp.com                                            Crowdsourcing Better Portraits. ACM Transactions on
                                                                      Graphics (SIGGRAPH Asia 2014) 33, 6 (2014).

</pre>