Automated Collection of High Quality 3D Avatar Images

                            James Kim, Darryl D’Souza, Roman V. Yampolskiy
                                   Computer Engineering and Computer Science
                                      University of Louisville, Louisville, KY
              jhkim012@louisville.edu, darryl.dsouza@louisville.edu, roman.yampolskiy@louisville.edu


Abstract CAPTCHAs are security tests designed to allow          of factors that determine a CAPTCHA’s insolvability
users to easily identify themselves as humans; however, as      to computers and the level of ease for a human to
research shows (Bursztein et al. 2010) these test aren’t        solve the test.
necessarily easy for humans to pass. As a result a new test,        Other simpler more novel and user friendly types
which requests users to identify images of real humans
among those of 3D virtual avatars, is proposed that would
                                                                of CAPTCHAs do exist, such as the Animal Species
create a potentially unsolvable obstacle for computers and a    Image Recognition for Restricting Access (ASIRR)
quick, easy verification for humans. In order to provide test   or the ‘drag and drop’ CAPTCHA (Geetha and
cases for this new test, an automated bot is used to collect    Ragavi 2011). However text-based CAPTCHAs are
images of 3D avatars from Evolver.com.                          more widely used than sound or image based tests
                                                                because of the numerous variations of distortion
INTRODUCTION                                                    methods to increase the level of insolvability to
                                                                computers, despite the added difficulty for humans to
    Today’s      commonly      used     security    test        pass them (Geetha and Ragavi 2011). Each test has
distinguishing real humans from bots on the Internet            its own design flaws, thus security tests that present
is CAPTCHA. Two noticeable issues result from                   the highest level of insolvability to computers and are
using such tests. One is that these tests are gradually         easiest to solve for humans are sought.
becoming less reliable as exposed design flaws and                  Computers have the potential to detect text from a
improved algorithms successfully crack CAPTCHAs.                variety of distorted images, but their ability to
For example, a CAPTCHA cracking program called                  differentiate between two objects in different colors
deCAPTCHA, which was developed by a group of                    or textures is a different matter entirely. Considering
Stanford researchers, successfully decrypted image              this as a foundation, a test was proposed that would
and audio-based text CAPTCHAs from a number of                  create a small table using images of virtual avatars
famous sites. The success rate of passing                       and real humans and ask the user to select those that
CAPTCHAs varies from site to site, but some                     are of real humans. Generally, the test would include
examples include a 66% success rate on Visa’s                   two or more images of real humans in order to
Authorize.net, 43% on Ebay, and 73% on                          increase the difficulty for computers.
captcha.com (Bursztein, Martin, and Mitchell 2011).                 In addition, the colors, textures, and other factors
Each website uses a combination of different text               of the images would be variable in order to create as
distortion factors in their CAPTCHAs; however, it               few viable variables as possible. For example, the
was evident that design flaws were present in most              images of virtual avatars or humans may be black and
tests. For example, the deCAPTCHA program was                   white, pastel, or even sketched. Furthermore, various
able to achieve a 43% success rate because the                  clothing and accessories would potentially add more
researchers were able to exploit the fact that eBay’s           to the challenge. Inevitably, the test would require an
CAPTCHAs were using a fixed number of digits and                algorithm that demands critical thinking, one that
regular font sizes (Bursztein, Martin, and Mitchell             answers the question “How does a real human
2011).                                                          appear?”
    Second, according to research, humans encounter                 In order to begin testing, a dataset of high quality
difficulties in solving CAPTCHAs, especially audio-             3D virtual avatars was required. Virtual communities
CAPTCHAs. Perhaps the tests are becoming more                   such as Second Life or Active Worlds provided an
challenging for the human users than they are for               extensive selection of body characteristics and
computers, considering deCAPTCHA’s performance                  articles of clothing for avatar creation. The tradeoff
(Bursztein et al. 2010). This test is designed to be an         for extensive customization was the detail or quality
unsolvable obstacle for computers while remaining               of the avatars. Considering that both these virtual
easy for humans. One may even hypothesize that                  communities in particular have massive online
there may be a correlation between the combination              communities of users, it’s understandable that the
system requirements to run the client application are     METHODOLOGY
very low since hardware costs are often a barrier to
accessing online content.                                     The Evolver avatar customization interface is
    Evolver, which is a 3D avatar generator that          slightly more unique in that it prioritizes displaying
allows users to create and customize their own avatar,    explicit details of a particular attribute than
was found to be the best choice. The avatars were         emphasizing the overall variety in a section. This
highly detailed and elaborate with options to orient      visual design explains why when a mouse hovers
them into certain body positions, facial expressions,     over an attribute; the image automatically enlarges so
or animations. Although Evolver did not possess the       that the user would have a better visual. Some
extensive selection of avatar characteristics as the      physical attributes with variations, such as varying
other virtual words, its selections were large enough     colors or patterns, show gray arrows on each side of
for our purposes.                                         the image when enlarged, which allows the user to
    In addition, one noticeable difference between        rotate through the variations instead of viewing all of
Evolver and the other virtual worlds was the option to    them in a grid on a reloaded webpage.
morph different physical attributes together. Evolver         At its execution, the script would randomly click
uses a large database of pre-generated body-parts         either a male or female gender option to begin the
called a virtual gene pool to morph together two          avatar creation process. Once the webpage loaded the
physical attributes in morph bins. Users choose the       Face tab, the script would make sure that both
degree of which they are morphed together by using a      checkboxes for male and female physical attributes
slider, which measures the domination of one              were selected so as to open more choices for a truly
attribute over the other (Boulay 2006).                   random avatar. Seven icons were present on the left
    The collection of avatar images was collected         side, each of which focused on a specific area of the
using Sikuli, which is an image recognition based         face (Figure 1). In each area, a slider exists to enable
scripting environment that automates graphic user         the user to control the degree of attribute morphing.
interfaces (Lawton 2010). Sikuli’s integrated             Initially, the random die button is clicked first to
development environment features an image                 obtain two random face presets, and the slider would
capturing tool that allows the users to collect the       be dragged to a random location on the line to
images required to fulfill the parameters of unique       correlate a random degree of morphing. The script
functions.                                                would click 65 pixels down from the center of the
    A Sikuli script could be run continuously, but        first selected icon in order to randomize the degree of
there is always some factor, such as server latency,      morphing for each area.
that could cause an error, which would cause a major          The script would proceed to the Skin tab on which
setback to data collection as the script would            there is a grid that showed all the physical attributes
terminate. As a result, Evolver’s simple and easily       for the skin. Under each tab there is a specific
navigable user interface was another reason why           number of pages of attributes. As a result, the script
Evolver was selected over Second Life, ActiveWorlds,      would randomly click through the pages given the
and other virtual communities. Evolver’s interface        range of the pages, and would randomly select an
divides physical attributes and articles of clothing      image “cell” within the grid. Under most of the tabs
into organized tabs, which would decrease the             that share a similar grid design, there would be a
possibility of failure when moving from one section       region that the script would act within to select an
to another.                                               image. The region is the area defined by the x-y
    ActiveWorlds in particular relied on scrolling,       coordinate of the top left corner, and the dimensions
which at times significantly impeded data collection
because the possibility of failure would increase
when the script failed to scroll down to continue with
the script. Generally, few images were able to be
generated without the Sikuli script misrecognizing
certain images. However, in any case of data
collection of avatar images from the web, server
latency is a huge factor in successful collection as it
could delay the timing for actions dictated in the
script.


                                                                      Figure 1. Face Tab(Evolver.com)
of the rectangle. Each tab had a differently sized         used to find and resolve unforeseen bugs. There were
region because of the varying dimensions of the            more than 100 face images that were unsuccessful,
physical attribute images.                                 yet the script managed to continue with the process
    Essentially, one attribute image within the grid       and collect other potentially successful samples. One
was considered as a cell, and after randomly selecting     interesting point to note is that there were a
a grid page, a random cell was chosen by moving the        significantly larger number of unsuccessful images of
mouse cursor to the center of the cell. In order to        faces than there were of bodies, which totaled about
confirm that the image has enlarged and that there         10-20 errors at most. This is primarily because the
was an image located at the cursor coordinate, a           method for adjusting the avatar’s face was more
yellow box with the text “Selected,” which appears in      problematic than that of the avatar’s body. Adjusting
the bottom right corner of the image, was scanned on       the avatar’s head relied on clicking ‘rotate left arrow’
the screen. Other cases were also resolved, such as        icons, which were often ignored by the server. Some
the issue when the script failed to recognize that the     of these major problems are longer load times or
mouse was hovering over the image given the default        Sikuli’s failure to recognize loading signals.
waiting period. Once the selected box is confirmed,            The source of most of these errors was primarily
the script checks to see if there are variations for the   due to server latency. Between every selection in the
attribute by scanning for an unselected gray arrow on      avatar customization process, there is a variable
the right side of the image. If there were variations      loading time and a loading box, which appears in the
for that attribute, the script would quickly rotate        center of the screen. Surprisingly, the loading time
through the variations given a random number of            differs for each attribute image applied to the avatar,
clicks. It is worth noting that viewing images on each     not necessarily differing for one type of physical
page or the variations through rotation respond            attribute. For example, the loading time for two
quickly, which suggests that these operations are run      different but similarly categorized physical attributes
client-side. Once the random image is selected, the        would vary. One could conjecture that the amount of
script would proceed to the next tab. Because of the       polygons, depending on the attribute, added upon the
similarity of the Eyes, Hairs, and Clothing tab with       avatar may cause the variable loading time.
the Skin tab, the same method of collection was used           These problems haven’t occurred very frequently,
for each.                                                  but their potential occurrences require hourly
    After the random hair selection, an image of the       checkups else the script could fail if left unchecked.
avatar’s face would be captured in PNG format on a         Overnight data collection was risky because of the
preview window, opened by the zoom button in the           lack of checkups, but in the cases that they were
preview. The resolution used to capture the image          successful, a partial number of defunct images were
depended on the region dimensions. The avatar was          present in the dataset. For example, previews of
rotated once toward the left side in order to capture a    avatars that still have the loading text in the center of
direct face-to-face image of the avatar. For the Body      the preview.
tab, the same method of image selection used in the            However, the other problem that exists is
Face tab was also applied in this tab because of the       executing the script on a different computer. Because
similarity between the interface of the Body tab and       of the differing monitor resolutions, adjustments to
the Face tab.                                              the program were required. The minimum changes
    After the images for the clothing tab, which           made were of the regions under the each tab because
contains sub-tabs of the top, bottom, and shoe tabs,       each has a defined upper left coordinate that is
the image of the avatar’s body was captured.               correct only in the previous computer’s resolution.
However, instead of the default position, which is a       The coordinate as a result must be redefined because
slightly angled arms out position, a face-to-face stand    it would throw off the entire collection process.
position was chosen in the dropdown box in the
zoomed preview window. Finally, the script would           CONCLUSION
select “New Avatar” located near the top right hand
corner of the web page in order to repeat the process.         Besides the unsuccessful captured images, there
                                                           were very rare occurrences where abnormal
RESULTS                                                    morphing would occur (Figure 6). However, server
                                                           latency was a higher concern, since images were
   The program was run as continuously as possible         often captured too early (Figure 5 and 8), because the
over the course of seven days, and approximately           loading text image wasn’t detectable when it was
1030 successful images were captured, more of              overlaid in the center of the avatar preview. As a
which were of the avatar body than the avatar face.        result, the program would instead rely on detecting
The initial days were primarily test runs that were        when the black area within the loading box
disappears before undertaking any action. However,
instructions to rotate the avatar head weren’t
recognized, and as a result most unsuccessful
captured face images were in the default angle
(Figure 3). Comparatively, changing body positions
were mostly recognized by the server resulting with
more successful body images (Figure 7) than face
images (Figure 4).
    Collected avatar images from previous research
are of relatively lower quality than of those in this
data set. This is primarily due to the implementation
of low graphics requirements to open the virtual
community to as many online users without hardware
being a limiting factor. Although low graphics result
with basic avatar models, simple facial expressions
can still be easily noticed and identified (Parke 1972).   Figure 3. Unsuccessful Captured Face Image - Default
In more complex avatar models, such as those in this                        Angle Unchanged
data set, there are thousands of more polygons that
allow more complex expressions to be shown.
    Furthermore, gathering the data from virtual
communities is incredibly tedious, considering there
are a number of other problems that exist, such as
involuntary self-movement of the avatar (Oursler,
Price, and Yampolskiy 2009). For example, the head
or eyes of the avatar would always shift in place, and
often times the captured image wouldn’t be in the
position displayed by the avatar in Figure 2.
Considering comparing Figures 2 and 4, the images
in this data set are of higher quality and of larger
resolution.
    Overall, this dataset could be repurposed not only
for imaged-based CATPCHA tests, but for facial
expression recognition and analysis. With the
addition of thousands of polygons, creating more
                                                                 Figure 4. Successful Captured Face Image
complex facial expressions is possible. As such,
biometric analysis could be performed on such
avatars as a precursor to the analysis of real user
biometric data.


                                                                Figure 5. Unsuccessful Captured Face Image
   Figure 2. Avatar image captured from Second Life                       - Loading Box in Center
                                       References

                                       Boulay,    Jacques-Andre. (2006) "Modeling: Evolver
                                                 Character Builder." Virtual Reality News and
                                                 Resources By/for the New World Architects.
                                                 http://vroot.org/node/722. Retrieved on February
                                                 21, 2012.

                                       Bursztein, Elie, Matthieu Martin, and John C. Mitchell.
                                                (2011) "Text-based CAPTCHA Strengths and
                                                Weaknesses” ACM Conference on Computer and
                                                Communications Security (CSS’2011). October
   Figure 6. Unsuccessful Captured              17-21, 2011. Chicago, USA.
      Image – Abnormal Morph
                                       Bursztein, Elie, Steven Bethard, Celine Fabry, John C.
                                                Mitchell, and Dan Jurafsky. "How Good Are
                                                Humans at Solving CAPTCHAs? A Large Scale
                                                Evaluation. " 2010 IEEE Symposium on Security
                                                and Privacy. pp. 399-413. May 16-19, 2010.
                                                Berkeley, CA.

                                       Geetha, Dr. G., and Ragavi V. "CAPTCHA Celebrating Its
                                               Quattuordecennial       –      A        Complete
                                               Reference." IJCSI International Journal of
                                               Computer Science Issues 2nd ser. 8.6 (2011): 340-
                                               49. Print.

                                       Lawton, George. (2010) "Screen-Capture Programming:
                                               What You See Is What You Script. "Computing
     Figure 7. Successful Captured             Now. IEEE Computer Society, Mar. 2010.
              Body Image                       Retrieved on February 22, 2012.
                                               <http://www.computer.org/portal/web/computing
                                               now/archive/news054>.

                                       Oursler, Justin N., Mathew Price, and Roman V.
                                               Yampolskiy. Parameterized Generation of Avatar
                                               Face Dataset. 14th International Conference on
                                               Computer Games: AI, Animation, Mobile,
                                               Interactive Multimedia, Educational & Serious
                                               Games (CGames’09), pp. 17-22. Louisville, KY.
                                               July 29-August 2, 2009.

                                       Parke, Frederick I. "Computer Generated Animation of
                                               Faces." ACM '72 Proceedings of the ACM
                                               Annual Conference. Vol. 1. NY: ACM New York,
                                               1972. 451-57. Print.
Figure 8. Unsuccessful Captured Body
   Image – Loading Box in Center