=Paper=
{{Paper
|id=Vol-2068/symcollab2
|storemode=property
|title=Selfie Guidance System in Good Head Postures
|pdfUrl=https://ceur-ws.org/Vol-2068/symcollab2.pdf
|volume=Vol-2068
|authors=Naihui Fang,Haoran Xie,Takeo Igarashi
|dblpUrl=https://dblp.org/rec/conf/iui/FangXI18
}}
==Selfie Guidance System in Good Head Postures==
<pdf width="1500px">https://ceur-ws.org/Vol-2068/symcollab2.pdf</pdf>
<pre>
Selfie Guidance System in Good Head Postures
Naihui Fang Haoran Xie Takeo Igarashi
Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science
The University of Tokyo The University of Tokyo The University of Tokyo
fangnaihui88@hotmail.com xiehr@acm.org takeo@acm.org

ABSTRACT
Taking selfies has become a popular and pervasive activity
on smart mobile devices nowadays. However, it is still dif-
ficult for the average user to take a good selfie, which is a
time-consuming and tedious task on normal mobile devices,
especially for those who are not good at selfies. In order to
reduce the difficulty of taking good selfies, this work pro-
poses an interactive selfie application developed with multiple
user interfaces to improve user satisfaction when taking self-
ies. Our proposed system helps average users take selfies by Figure 1: Our guidance system includes visual and voice user
providing visual and voice guidance interfaces on the proper interfaces which guide the common user to achieve satisfying
head postures to achieve good selfies. Preprocessing through selfie based on crowdsourcing results.
crowdsourcing-based learning is utilized to evaluate the score
space of possible head postures from hundreds of virtual self-
ies. For the interactive application, we adopt a geometric With the development of computer-vision technology, es-
approach to estimate the current head posture of users. Our pecially face recognition and facial-feature extraction ap-
user studies show that the proposed selfie user interface can proaches, various selfie applications have come out to help the
help common users taking good selfies and improve user satis- average user take better selfies on mobile platforms. For ex-
faction. ample, photo corrections and enhancements from a selfie can
be provided in a post-process way to specific area of the faces
Author Keywords [7]. Recently researchers proposed an approach to enhance
Selfie; head posture; crowdsourcing; user interface; the selfies by suggesting expression [16] and face geometries
[15] of the users in real-time. However, a user interface to
ACM Classification Keywords guide common users to take good selfies is still absent in all
D.2.3. Design Tools and Techniques: User Interface these previous works.

INTRODUCTION We propose an interactive selfie user interface to give clear
The selfie is a type of self-portrait that usually consists of a suggestions for users to take selfies based on crowdsourcing
photograph taken by mobile camera at an arm’s length. It is results on mobile platforms. There are many factors in taking
one of the most direct ways to show oneself to others as well as a good selfie, such as the light condition, background, user
to record one’s daily life for one’s own benefit. Selfies became expressions, and so on. In contrast to a recent selfie system
more and more popular with the development of social media. considering face position and light condition [10], we chose to
Now, it plays an important role in our daily life. Along with study head posture (orientation), which is the most important
its popularity, how to take a good selfie has become an urgent and controllable factor for selfie-taking in this work. Note that
issue to be solved. People want to look good on social media the other factors share commonalities in the design of user
but many of them encounter trouble when taking selfies. They interfaces and can therefore benefit from this work.
often spend long time and take a bunch of selfies trying to get In this work, we first conducted crowdsourcing tasks to define
one good selfie that makes them look attractive and different good head postures. In order to avoid factors other than head
from others. In particular, it is difficult to find a good head posture, we generated many virtual selfies using 3D human
posture for a selfie. In addition, it’s not easy to have the same models to receive the selfie score from crowd workers. The
head posture for the next one. virtual selfie receives higher scores if the users considered it
as more attractive. Based on the crowdsourcing results, we
developed a real-time user interface to be able to extract the
facial features from each frame and estimate the head postures
of the users in the real world. A geometry model approach [6]
was adopted to implement the real-time head-pose estimation
with the facial feature information extracted from each frame.
©2018. Copyright for the individual papers remains with the authors. The system can calculate the closest candidates to good head
Copying permitted for private and academic purposes. postures from crowdsourcing results, so that the system can
SymCollab ’18, March 11, 2018, Tokyo, Japan
instruct a user to achieve good selfie under the guidance of a
reference human head. This work is expected to reduce the dif-
ficulty of taking good selfies, thus improving user satisfaction
while taking selfies.
The main contributions of this work include: We proposed
a novel approach to support taking selfies by providing real-
time suggestions on head postures; we collected data of head
postures in selfie-taking from crowdsourcing results obtained
by using hundreds of virtual selfies from 3D models. We
implemented multiple ways of interaction between the user
and the proposed interfaces as illustrated in Figure 1.

RELATED WORK
Selfie Systems. Creamcam [11] and YouCam [4] to the facial
features or add makeup effect onto the human face based on
Figure 2: The framework of our selfie guidance system in-
facial feature extraction algorithms. Smile shutter function of
cludes offline crowdsourcing tasks and online selfie guidance
Sony DSC T300 provides the support on facial expressions.
system.
An interactive selfie application was proposed to suggest face
size, face position and lighting direction [10]. In contrast to
these works, our system focuses on this important selfie factor, for the user. To communicate this selfie suggestion, our system
head postures, in real time. shows a visual guide in the user interface to direct the motion
Photo Editing. An interactive photomontage system was pro- of the user’s head to achieve the ideal head posture. Voice
posed to create a composite set of images for the same objects, instructions are also proposed to help the user to move their
resulting in a single image for subsequent enhancement [1]. head using the proposed system. After the user achieves the
A similar approach was proposed to obtain good flash images ideal head posture, the proposed system provides a voice noti-
using a pair of flash and non-flash images [2]. fication and takes the selfies automatically. This helps the user
to take a good selfie more easily, thus reducing the difficulty
Aesthetics Feedback. Aesthetics feedback was given for pho- of taking good selfies and enhancing user satisfaction.
tos taken by mobile users from compositions, color combi-
nations and the aesthetic rating [14]. An aesthetic evaluation Crowdsourcing Tasks for Head Posture
approach utilized a peer-rated online photo-sharing website as
We designed two crowdsourcing tasks to explore good head
the input datasets based on machine learning [5].
postures on CrowdFlower. In the first task, the participants
Assistive Technology. There are other researches that consid- were asked to score each selfie, and the average score of each
ering about the assistive technology to support the interaction selfie was used to represent its attractiveness. In the first task,
between users and user interface. Voykinska et al. studied we ask the crwodworkers to score each selfie according to its
about the way that blind people use to interact with social attractiveness from 1 to 10 where 10 is stronly attractive while
network [13]. Researchers also provided approach to ask re- 1 represents not attractive at all. For each selfie, 5 participants
mote paid workers to help the blind people with some camera are asked to score it and we adopt the average score of each
devices and the internet [3]. Similar research asked the crowd selfie to represent its attractiveness. The second task is to ask
to answer the visual questions to help blind people [9]. users to evaluate the attractiveness of the selfie with different
distances from camera. We adopted the 3D virtual model
FRAMEWORK instead of real selfies for evaluation due to the accurate control
The framework of this work consists of both offline and online of head posture that is possible with virtual selfies; meanwhile,
parts as shown in Figure 2. For the offline part, we explored it is difficult to obtain real selfies with exact head postures. The
good head postures by crowdsourcing results to define good usage of virtual selfies can reduce other factors influencing
head posture. We generated hundreds of virtual selfies with the selfies in order to specifically evaluate the attractiveness of
different head postures (two types of virtual selfies used in selfies. To construct a relatively comprehensive selfie database
this paper are created based on the 3D models from Unity3D). that includes various head postures, we created a database
Among these virtual selfies, we defined “good” head posture using virtual selfies, which was comprised of 486 different
as selfies which got higher than 7. For best selfie position, we head postures with pitch angle ranging from (-40, 40), roll
generated virtual selfies with different distances to the camera angle ranging from (-40, 40) and yaw angle ranging from (-30,
to evaluate the attractiveness and chose the distance with the 20). We sample virtual selfies in every 10 degrees for each
highest score. dimension.
For the online part, we extracted the user’s facial features for The results of the crowdsourcing task are shown in the right
each frame and generated a 3D facial model to calculate the of Figure 3. The green points represent the head postures that
head posture of users using a geometric approach and in real were rated higher than 7, the red points represent the head
time. Our system calculates the closest good head posture to postures rated higher than 5 but lower than 7 and the grey
the user’s current head posture and suggests it as the ideal pose points represent the head postures rated lower than 5. In the
Figure 3: Virtual selfies obtained good, medium, and bad scores from crowdsourcing results.

first crowdsourcing task, among 486 selfies, the 58 with the
highest score were defined as good head orientations. We
described the head postures that got scores higher than 7 as
“good”, 5-7 as medium and under 5 as bad. The proposed
selfie guidance system aims to take “good” selfies; thus, we
chose the “good” head posture to set the system. We show the
selected samples of the selfies which got good, medium and
bad scores in Figure 3. From these results we found that the
selfies with the front face achieved the highest scores from the
crowdsourcing results. However, the common users would not
be satisfied with single head pose, and more choices of head
poses are preferred.
In the second crowdsourcing task, we generated 6 virtual Figure 4: Interpolation of the results of crowdsourcing task.
selfies with different distances considering the length of the
arm of human. We utilized the selfie scored as the highest to
define the best distances. We accepted a larger tolerance for Facial Feature Extraction Based on Face Landmark
the distances for two reasons: first, we calculated the distance In this work, facial feature extraction plays an important role
roughly by calculating the area of the triangle constructed in the estimation of the head posture of the user. Many re-
by the far corners of the eyes and nose (this may be affected searchers have made efforts to solve this problem in order to
by different head postures, possibly rendering it inaccurate); attain a high accuracy in the recognition of human faces and
from results of this task, we know that the distance doesn’t facial features. Also in this work, we extracted eyes, mouth
affect the selfie as the head posture (selfies in a quite large and nose feature points based on dlib [8] which uses 68 points
range get relatively similar marks). to describe human faces. This algorithm is a feature-based
algorithm based on the HOG combined with a linear classifier
To obtain continuous assessment for head postures, we and an image pyramid, and sliding window detection scheme.
interpolated the crowdsourcing results to compute a score
function for the head postures using trilinear interpolation.
Figure 4 illustrates the results of the interpolation (We Head Posture Estimation Based on Geometry Model
display it using several slices). The multi-colored progress Head pose estimation is the key part in this work that can
bar represents the distribution of color from high score to estimate the head orientation of the user based on the facial
low score of the head postures; the red region shows the features we extracted. The head-movement suggestions pro-
high-scoring head postures, and the blue region shows the vided in the user interface are all based on the estimation
head postures that get relatively low scores. As a result, both results. In this work, we estimate the head posture using a
datasets contain large portions of relatively good selfies as geometry model [12]. To introduce how we estimate the head
scored by the score function from the crowdsourcing results. posture, we will first introduce how we describe the head pos-
Since both datasets are comprised of selfies or photographs ture. Generally, we define it as the position and the orientation
that are considered relatively attractive, the experimental of the head. However, the position of the head can be cal-
results verified that a good head posture itself has a significant culated directly from the image; thus, we focus on how to
impact on the aesthetic of a selfie. calculate the orientation from a single image (frame). Head
orientation can be described by roll, yaw and pitch angles. The
three axes are orthogonal, so that we can calculate each angle
respectively. The proposed geometry model assumes that the
(I − ddT ).
p
R2 − v1 − 2v2 R2n + (v1 − R2n )2 + 4v1 v2 R2n
z2normal = n (2)
2(1 − v2 )R2n
2
ln y2 z2
Here v1 = lf
normal
and v2 = (1−y2normal)(1−z 2 .
normal normal )

Roll angle of the head can be simply calculated by the differ-
ences between the left eye and the right eye. Here we obtain
the left eye and right eye coordinates by the facial feature ex-
traction algorithm. We use a arctan() function to turn the ratio
of the differences of the coordinates to degrees to measure the
roll angle.
y −y
l r
θr = arctan (3)
Figure 5: Experimental results run on smart phone. xl − xr
Calculating yaw angle and pitch angle involves the normal we
gained before and the x, y, z represents each component of
head is bilateral symmetry and the two far corner of eyes and
p in Eq. (1) (x = sin σ cos τ,
facial normal vector which is defined
the two mouth corners form a plane and also define a facial y = sin σ sin τ, z = − cos σ , r = x2 + y2 + z2 ).
normal to show the direction of the nose. This facial normal
shows the direction of the head. In contrast to other head-pose px 2 + y 2 z
estimation algorithms (tracking approach, etc.), the series data θy = arccos , θ p = arccos (4)
r r
of previous frames are not required by the proposed geom-
etry approach to obtain the head pose. Another approaches These three angles describe the head orientation of the end
is direct landmark fitting algorithm which is quite similar to user in the real world. Using this head-orientation information
our approach. However, by establishing a facial model, our and the crowdsourcing results, the system can provide further
approach enables relatively simple way to calculate each angle suggestions for head-movement to the user in real time.
for the head pose which is enough to proof the concept in our
USER INTERFACE
research.
We implemented the selfie guidance system on the iOS 9.2
Facial Model Conduction using Swift and Objective-C to instruct the user to achieve the
We define L f to be the distance from the far corner of eyes head posture that was defined to be good by the crowdsourc-
to the corner of the mouth in the facial model, Ln to be the ing tasks. The system applied the face-landmark algorithm to
distance from the nose tip to the nose base, and Lm to be the process the camera frame in question. We adopted the eyes,
distance from nose base to the mouth. Le denotes the distances nose and mouth points to estimate the head posture. For the
between two far corner of the eyes. Notice that l f , ln , lm , le are first frame, the system extracts the facial features to initialize
the projected values. We calculate the following three ratios the face-geometry model. It then calculates the head posture
to define the facial model: Rm , and Rn . Here, Rm is calculated with the facial features from each frame and the face-geometry
by Lm /L f and Rn is Ln /L f . Because all these ratios are quite model. The whole system is described in Algorithm 1. The
stable with respect to different faces, constant values can be user interface of our selfie guidance system consists of a vi-
adopted to replace the ratios unless a very high accuracy of sual user interface and voice instruction. We combined these
the orientation is required. For the typical face, Rm = 0.6, two methods for interacting with the user to provide new user
Rn = 0.4. The Rm and Rn are used to calculate the slant as experiences and reduce the difficulty for users in understand-
shown in the next section, and the more detailed description ing the instructions when taking selfies. For pilot study, we
can be found in [6]. implemented two selfie guidance systems: only visual user
interface; both visual and voice user interfaces. From the user
Estimating of the User’s Head Posture feedback, the participants reported that it is interesting and
We represent a head posture with τ, which is the angle between convenient to following the voice instructions.
the symmetry axis and the facial normal (the projected one in
the 2D image) and a slant σ , which is the angle between the Visual User Interface
normal of the image and the facial normal. The facial normal The visual user interface is implemented as shown in Figure 6.
can be calculated as Eq. (1). To reduce the operational difficulty of head movement, the user
interface provides suggestions for roll, yaw and pitch angles
n̂ = [sin σ cos τ, sin σ sin τ, − cos σ ] (1) separately. Thus, the user can move their head one direction
at a time. With the combination of these head movements,
Slant σ can be calculated by the normal of the image d = the user can easily achieve good head posture. The arrows
[xnormal , ynormal , znormal ] as cosσ =| z2normal | (znormal is a nor- shown in the top and bottom of the user interface suggest the
malized value). znormal can be defined using two values v1 and pitch angle and ask users to adjust their head by tilting their
v2 which is shown in the Eq. (2) and the projection matrix head up or down slightly. The arrows in the right and left
Algorithm 1: Real-time processing for taking a selfie left slightly!” and “turn right slightly!” for the yaw angle,
and these suggestions correspond to the arrows on the visual
input :Camera frame including a human face.
user interface. When the user has the right posture, and is
output :A good selfie stored in the album.
ready to take selfies, the system will display “Perfect! Let’s
1 Define good head postures by crowdsourcing approach using take a photo, 3, 2, 1, cheese!” This notifies the user when to
486 virtual selfies with different head postures. This results take the selfie and provides extra time for them to adjust their
in 58 good head posture sets. expressions.
2 Extract the eyes, nose and mouth points from the frame based
on the face landmark algorithm which uses HOG feature USER STUDY
combining with a linear classifier to estimate a vector S (S We designed a comparison user study in order to evaluate the
contains the coordinates information of each facial point). validity of the developed selfie-supporting system. (We also
3 Use the facial features extracted from the frame to establish
designed pilot study to compare the system that only have
geometry facial model F. visual instructions with the system that have visual and voice
4 Based on F and the facial feature information to estimate the
instructions). Of the 8 participants, 2 are female and others
head posture Pc from the current frame which is the current are male. The ages of all the participants are between 22 and
head posture of the user. 28. Some of them spend a great deal of time taking selfies
5 Calculate the Euclidean distances between P and each good
while others take very few or no selfies. We mainly asked
head posture Pg (defined by crowdsourcing task) to find a them to use our developed system and a normal camera to
minimum value which is defined as “the closest good head experience the difference and asked them to fill out a ques-
posture” Pcg (notice here the system calculates the good tionnaire to evaluate the system with regards to several aspects.
head posture once until the frame count fc to be reset).
6 Compare each angle defined in Pcg and those of Pc to provide
There were two tasks, and the participants were asked
different suggestions to complete the two tasks in a random order. Task 1 asked
7 Go back to step 1 if the Pc doesn’t equal to the Pcg (with a
the participants to take several selfies using a normal camera
tolerance ∆d). without any suggestions during the process. We asked them to
8 Terminate the loop when Pc is close enough to the Pcg (with
choose 5 selfies that they were satisfied with. They could take
∆d); take photo and store to the album and reset the frame as many selfies as they wanted until they got 5 satisfactory
count fc . selfies. Task 2 asked the participants to use the developed
selfie-supporting system to take 5 or 6 selfies. (The reason
some of the users take more than 5 selfies is that we were
side of the screen instruct users to turn their head right or left trying to avoiding multiple selfies with the same head posture).
slightly. The semicircle arrows instruct the user to tilt his or In this task, they followed the suggestions about their head
her head right or left. These arrows will keep showing in the posture and also heard voice instructions while taking selfies.
user interface until the user’s head achieves the right position.
In the experiment, 4 of the participants were asked to
complete Task 1 first and then complete the Task 2 while
the other 4 users were asked to complete Task 2 first and
then Task 1. In addition to this comparison user study, we
also asked them to use our selfie-supporting system with the
tripod settings to experience taking selfies with the visual- and
voice-instruction user interface while at a distance further than
human arms can reach. After all the tasks above, they were
also asked to fill a questionnaire to evaluate their satisfaction
with the system, the user interface and also the attractiveness
of the head postures provided by this system.
Figure 6: Visual user interface of the developed selfie guidance
system. RESULTS
We applied the score function according to the crowdsourcing
results to the result of the user study. Figure 7 and Figure 8
Voice User Interface illustrate examples in which the user used a normal camera
In the voice instruction part, the selfie-supporting system pro- to take selfies using our proposed system. From the result we
vides voice instructions to help the user to understand how to can see that the selfies that taken by our system get higher
move their head to match the good head posture. There are scores on average. For example, for the second user, the
four types of voice instructions that indicate suggestions for average score of selfies taken by a normal camera was 6.94;
proper head orientation and distance, and that check for the meanwhile, the selfies taken by our system were scored 7.7 on
presence of a smile. For the head orientation suggestions, the average. Here we divided the user into two groups according
system displays “tilt head right slightly!” and “tilt head left to whether they were good at taking selfies or not. Figure 7
slightly!” to suggest the roll angle; “tilt head up slightly!” and shows the examples of users who were not good at taking
“tilt head down slightly!” to suggest the pitch angle; and “turn selfies. We can see from this result that the proposed system
helps the users take better selfies overall. In these examples, of users think that our system is more convenient and 25% of
the average score of selfies taken by our system increases users think that the both systems are equivalent. Figure 10
around 30% compared to the selfies taken by normal camera. (b) illustrates what users think of the head postures provided
Figure 8 shows the examples of users who exhibited relatively in our system. The results show that 87.5% of users agree
that our selfie supporting system provide more choices for
them when taking selfies (25% of users choose strongly agree;
62.5% of users choose agree); 12.5% of users neither agree
nor disagree.
Figure 9 (a) shows the user feedback concerning the devel-
oped selfie-support system. We asked users about whether the
application helped them take better selfies and if the function
suggesting head posture was needed, in their opinion. We also
asked about whether they thought it was easier to take good
selfies with the supporting system. The chart shows the aver-
age score of each question and we can see that the users have
quite high satisfaction with the system. Figure 9 (b) shows the
average scores for evaluations about the head posture, which
show that users confirm that the head postures provided from
the application are good. Figure 9 (c) shows the average scores
Figure 7: Scoring results of user’s selfies taken by normal for evaluations about the visual user interface. In this part we
camera and our system for those who are not good at taking asked users whether they could understand the meaning of the
selfie. arrows and if they could realize which arrows suggested roll,
yaw and pitch angle, respectively. From these results we find
better skill in taking selfies. They got higher scores even that the users can understand the meaning and how to follow
using a normal camera. However, our system still helped them the suggestions well.
get higher score on average. The proposed system can also Figure 11 shows the evaluations of the voice instructions. To
support them by providing more head postures so as to give evaluate whether the voice instructions work when the user is
them options for different appearances. taking selfies, we asked the user about two aspects: whether
they were aware of the voice instruction and whether they
understood the meaning of the voice instructions. We also
asked the users whether the voice instructions were interesting
(Q24), whether they were helpful (Q25) and whether they
were easy to understand (Q26). According to the results, users
were able to notice and understand the voice instruction well.

CONCLUSION
In this work, we proposed a real-time selfie support system
which provides suggestions for the head posture of the user.
This proposed system aimed to help user to take good selfies.
Two crowdsourcing tasks were conducted to explore good
head orientations and the best distance from the camera (posi-
tion). In each crowdsourcing task, we generated virtual selfies
to create a database and asked crowdworkers to score each
virtual selfie by its attractiveness. After the crowdsourcing
Figure 8: Scoring results of user’s selfies taken by normal task, we developed a selfie system which provides real-time
camera and our system for those who are good at taking selfie. suggestions for head postures. We implemented both the vi-
Note that there are more options of head postures by using our sual user interface, in which arrows suggest movements of
proposed system than normal camera. the head as well as the voice instructions, which also give
guidance to the users on how to move their head and in which
direction. A user study was conducted where participants were
The questions in our questionnaire queried the user regarding
asked to take selfies using two cameras (a normal camera and
system satisfaction, feedback on head posture provided by
the camera with our head-posture support system). According
the system, the visual user interface and the voice instruction.
to the study results, the proposed system can help users to take
For these statements, users were asked to score the statement good selfies and improve user satisfaction.
from 1 to 5, where 1 represents “strongly disagree” and the 5
represents “strongly agree.” The evaluation and analysis of the In this work, we explored good head posture using a 3D virtual
system have been done according to the result of the user study. model and applied the crowdsourcing results to all the users.
Figure 10 (a) shows the results of the comparison of the normal The aesthetic evaluation may vary due to different cultural
camera and our system. As we can see in the Figure 10, 75% backgrounds or knowledge differences, so this could be further
Figure 9: The average score of the user feedback for the application (a); the average score of user feedback for the head postures
provided by the application (b); the average score of user feedback for the visual user interface (c).

REFERENCES
1. Aseem Agarwala, Mira Dontcheva, Maneesh Agrawala,
Steven Drucker, Alex Colburn, Brian Curless, David
Salesin, and Michael Cohen. 2004. Interactive Digital
Photomontage. ACM Trans. Graph. 23, 3 (Aug. 2004),
294–302. DOI:
http://dx.doi.org/10.1145/1015706.1015718

2. Amit Agrawal, Ramesh Raskar, Shree K. Nayar, and
Yuanzhen Li. 2005. Removing Photography Artifacts
Using Gradient Projection and Flash-exposure Sampling.
ACM Trans. Graph. 24, 3 (July 2005), 828–835. DOI:
Figure 10: The user feedback for the comparison of the normal http://dx.doi.org/10.1145/1073204.1073269
camera and the developed selfie supporting system (a); the
user feedback for whether they have more choice of head 3. Jeffrey P. Bigham, Chandrika Jayant, Hanjie Ji, Greg
postures with our selfie supporting system (b). Little, Andrew Miller, Robert C. Miller, Aubrey
Tatarowicz, Brandyn White, Samuel White, and Tom Yeh.
2010. VizWiz: Nearly Real-time Answers to Visual
studied by considering more parameters for certain users. The Questions. In Proceedings of the 2010 International
integration of other selfie factors is not difficult to implement in Cross Disciplinary Conference on Web Accessibility
our selfie guidance system. In the future, it is worth exploring (W4A) (W4A ’10). ACM, New York, NY, USA, Article
the relationship between the substantial amounts of selfie data 24, 2 pages. DOI:
and methods for subjective crowdsourcing evaluations using http://dx.doi.org/10.1145/1805986.1806020
deep learning to achieve satisfying selfies. An alternative way
4. Perfect Corp. 2015. YouCam Perfect. (2015).
to evaluate the quality of selfie can be explored by example-
http://www.perfectcorp.com/stat/product/CyberLink_app/
based approach to compare the reviews from social network
Perfect_Corp/enu/design/perfectCorp/index.jsp
or ask crowdworkers to score the selfies taken by our system
and other work with functionality. The crowdsourcing-based 5. Ritendra Datta, Dhiraj Joshi, Jia Li, and James Z. Wang.
evaluation approach can also be applied to other areas such as 2006. Studying Aesthetics in Photographic Images Using
evaluating the accuracy of face recognition. a Computational Approach. In Proceedings of the 9th
European Conference on Computer Vision - Volume Part
ACKNOWLEDGMENTS III (ECCV’06). Springer-Verlag, Berlin, Heidelberg,
We thank all participants for joining our case studies, and 288–301. DOI:http://dx.doi.org/10.1007/11744078_23
Morph 3D for sharing the head model in Figure 3. This work
6. Andrew Gee and Roberto Cipolla. 1994. Determining the
was supported by JST CREST Grant Number JPMJCR17A1
gaze of faces in images. Image and Vision Computing 12,
and JSPS KAKENHI Grant Number JP17H06574, Japan. Hao-
10 (1994), 639 – 647. DOI:
ran Xie is funded by Epson International Foundation.
http://dx.doi.org/10.1016/0262-8856(94)90039-6
Figure 11: The user feedback about the voice instructions.

7. Neel Joshi, Wojciech Matusik, Edward H. Adelson, and 12. Baback Moghaddam and Alexander P. Pentland. 1994.
David J. Kriegman. 2010. Personal Photo Enhancement Face recognition using view-based and modular
Using Example Images. ACM Trans. Graph. 29, 2, eigenspaces. Proc. SPIE 2277 (1994), 12–21.
Article 12 (April 2010), 15 pages. DOI:
13. Violeta Voykinska, Shiri Azenkot, Shaomei Wu, and Gilly
http://dx.doi.org/10.1145/1731047.1731050
Leshed. 2016. How Blind People Interact with Visual
8. V. Kazemi and J. Sullivan. 2014. One millisecond face Content on Social Networking Services. In Proceedings
alignment with an ensemble of regression trees. In 2014 of the 19th ACM Conference on Computer-Supported
IEEE Conference on Computer Vision and Pattern Cooperative Work & Social Computing (CSCW ’16).
Recognition. 1867–1874. DOI: ACM, New York, NY, USA, 1584–1595. DOI:
http://dx.doi.org/10.1109/CVPR.2014.241 http://dx.doi.org/10.1145/2818048.2820013

9. Walter S. Lasecki, Phyo Thiha, Yu Zhong, Erin Brady, 14. Lei Yao, Poonam Suryanarayan, Mu Qiao, James Z.
and Jeffrey P. Bigham. 2013. Answering Visual Wang, and Jia Li. 2012. OSCAR: On-Site Composition
Questions with Conversational Crowd Assistants. In and Aesthetics Feedback through Exemplars for
Proceedings of the 15th International ACM SIGACCESS Photographers. International Journal of Computer Vision
Conference on Computers and Accessibility (ASSETS 96, 3 (2012), 353–383.
’13). ACM, New York, NY, USA, Article 18, 8 pages.
15. Mei-Chen Yeh and Hsiao-Wei Lin. 2014. Virtual
DOI:http://dx.doi.org/10.1145/2513383.2517033
Portraitist: Aesthetic Evaluation of Selfies Based on
10. Qifan Li and Daniel Vogel. 2017. Guided Selfies Using Angle. In Proceedings of the 22Nd ACM International
Models of Portrait Aesthetics. In Proceedings of the 2017 Conference on Multimedia (MM ’14). ACM, New York,
Conference on Designing Interactive Systems (DIS ’17). NY, USA, 221–224. DOI:
ACM, New York, NY, USA, 179–190. DOI: http://dx.doi.org/10.1145/2647868.2656401
http://dx.doi.org/10.1145/3064663.3064700
16. Jun-Yan Zhu, Aseem Agarwala, Alexei A Efros, Eli
11. LoftLab. 2016. CreamCam Selfie Filter. (2016). Shechtman, and Jue Wang. 2014. Mirror Mirror:
http://creamcamapp.com Crowdsourcing Better Portraits. ACM Transactions on
Graphics (SIGGRAPH Asia 2014) 33, 6 (2014).

</pre>