=Paper=
{{Paper
|id=None
|storemode=property
|title= Automated Collection of High Quality 3D Avatar Images
|pdfUrl=https://ceur-ws.org/Vol-841/submission_24.pdf
|volume=Vol-841
|dblpUrl=https://dblp.org/rec/conf/maics/KimDY12
}}
== Automated Collection of High Quality 3D Avatar Images==
Automated Collection of High Quality 3D Avatar Images
James Kim, Darryl D’Souza, Roman V. Yampolskiy
Computer Engineering and Computer Science
University of Louisville, Louisville, KY
jhkim012@louisville.edu, darryl.dsouza@louisville.edu, roman.yampolskiy@louisville.edu
Abstract CAPTCHAs are security tests designed to allow of factors that determine a CAPTCHA’s insolvability
users to easily identify themselves as humans; however, as to computers and the level of ease for a human to
research shows (Bursztein et al. 2010) these test aren’t solve the test.
necessarily easy for humans to pass. As a result a new test, Other simpler more novel and user friendly types
which requests users to identify images of real humans
among those of 3D virtual avatars, is proposed that would
of CAPTCHAs do exist, such as the Animal Species
create a potentially unsolvable obstacle for computers and a Image Recognition for Restricting Access (ASIRR)
quick, easy verification for humans. In order to provide test or the ‘drag and drop’ CAPTCHA (Geetha and
cases for this new test, an automated bot is used to collect Ragavi 2011). However text-based CAPTCHAs are
images of 3D avatars from Evolver.com. more widely used than sound or image based tests
because of the numerous variations of distortion
INTRODUCTION methods to increase the level of insolvability to
computers, despite the added difficulty for humans to
Today’s commonly used security test pass them (Geetha and Ragavi 2011). Each test has
distinguishing real humans from bots on the Internet its own design flaws, thus security tests that present
is CAPTCHA. Two noticeable issues result from the highest level of insolvability to computers and are
using such tests. One is that these tests are gradually easiest to solve for humans are sought.
becoming less reliable as exposed design flaws and Computers have the potential to detect text from a
improved algorithms successfully crack CAPTCHAs. variety of distorted images, but their ability to
For example, a CAPTCHA cracking program called differentiate between two objects in different colors
deCAPTCHA, which was developed by a group of or textures is a different matter entirely. Considering
Stanford researchers, successfully decrypted image this as a foundation, a test was proposed that would
and audio-based text CAPTCHAs from a number of create a small table using images of virtual avatars
famous sites. The success rate of passing and real humans and ask the user to select those that
CAPTCHAs varies from site to site, but some are of real humans. Generally, the test would include
examples include a 66% success rate on Visa’s two or more images of real humans in order to
Authorize.net, 43% on Ebay, and 73% on increase the difficulty for computers.
captcha.com (Bursztein, Martin, and Mitchell 2011). In addition, the colors, textures, and other factors
Each website uses a combination of different text of the images would be variable in order to create as
distortion factors in their CAPTCHAs; however, it few viable variables as possible. For example, the
was evident that design flaws were present in most images of virtual avatars or humans may be black and
tests. For example, the deCAPTCHA program was white, pastel, or even sketched. Furthermore, various
able to achieve a 43% success rate because the clothing and accessories would potentially add more
researchers were able to exploit the fact that eBay’s to the challenge. Inevitably, the test would require an
CAPTCHAs were using a fixed number of digits and algorithm that demands critical thinking, one that
regular font sizes (Bursztein, Martin, and Mitchell answers the question “How does a real human
2011). appear?”
Second, according to research, humans encounter In order to begin testing, a dataset of high quality
difficulties in solving CAPTCHAs, especially audio- 3D virtual avatars was required. Virtual communities
CAPTCHAs. Perhaps the tests are becoming more such as Second Life or Active Worlds provided an
challenging for the human users than they are for extensive selection of body characteristics and
computers, considering deCAPTCHA’s performance articles of clothing for avatar creation. The tradeoff
(Bursztein et al. 2010). This test is designed to be an for extensive customization was the detail or quality
unsolvable obstacle for computers while remaining of the avatars. Considering that both these virtual
easy for humans. One may even hypothesize that communities in particular have massive online
there may be a correlation between the combination communities of users, it’s understandable that the
system requirements to run the client application are METHODOLOGY
very low since hardware costs are often a barrier to
accessing online content. The Evolver avatar customization interface is
Evolver, which is a 3D avatar generator that slightly more unique in that it prioritizes displaying
allows users to create and customize their own avatar, explicit details of a particular attribute than
was found to be the best choice. The avatars were emphasizing the overall variety in a section. This
highly detailed and elaborate with options to orient visual design explains why when a mouse hovers
them into certain body positions, facial expressions, over an attribute; the image automatically enlarges so
or animations. Although Evolver did not possess the that the user would have a better visual. Some
extensive selection of avatar characteristics as the physical attributes with variations, such as varying
other virtual words, its selections were large enough colors or patterns, show gray arrows on each side of
for our purposes. the image when enlarged, which allows the user to
In addition, one noticeable difference between rotate through the variations instead of viewing all of
Evolver and the other virtual worlds was the option to them in a grid on a reloaded webpage.
morph different physical attributes together. Evolver At its execution, the script would randomly click
uses a large database of pre-generated body-parts either a male or female gender option to begin the
called a virtual gene pool to morph together two avatar creation process. Once the webpage loaded the
physical attributes in morph bins. Users choose the Face tab, the script would make sure that both
degree of which they are morphed together by using a checkboxes for male and female physical attributes
slider, which measures the domination of one were selected so as to open more choices for a truly
attribute over the other (Boulay 2006). random avatar. Seven icons were present on the left
The collection of avatar images was collected side, each of which focused on a specific area of the
using Sikuli, which is an image recognition based face (Figure 1). In each area, a slider exists to enable
scripting environment that automates graphic user the user to control the degree of attribute morphing.
interfaces (Lawton 2010). Sikuli’s integrated Initially, the random die button is clicked first to
development environment features an image obtain two random face presets, and the slider would
capturing tool that allows the users to collect the be dragged to a random location on the line to
images required to fulfill the parameters of unique correlate a random degree of morphing. The script
functions. would click 65 pixels down from the center of the
A Sikuli script could be run continuously, but first selected icon in order to randomize the degree of
there is always some factor, such as server latency, morphing for each area.
that could cause an error, which would cause a major The script would proceed to the Skin tab on which
setback to data collection as the script would there is a grid that showed all the physical attributes
terminate. As a result, Evolver’s simple and easily for the skin. Under each tab there is a specific
navigable user interface was another reason why number of pages of attributes. As a result, the script
Evolver was selected over Second Life, ActiveWorlds, would randomly click through the pages given the
and other virtual communities. Evolver’s interface range of the pages, and would randomly select an
divides physical attributes and articles of clothing image “cell” within the grid. Under most of the tabs
into organized tabs, which would decrease the that share a similar grid design, there would be a
possibility of failure when moving from one section region that the script would act within to select an
to another. image. The region is the area defined by the x-y
ActiveWorlds in particular relied on scrolling, coordinate of the top left corner, and the dimensions
which at times significantly impeded data collection
because the possibility of failure would increase
when the script failed to scroll down to continue with
the script. Generally, few images were able to be
generated without the Sikuli script misrecognizing
certain images. However, in any case of data
collection of avatar images from the web, server
latency is a huge factor in successful collection as it
could delay the timing for actions dictated in the
script.
Figure 1. Face Tab(Evolver.com)
of the rectangle. Each tab had a differently sized used to find and resolve unforeseen bugs. There were
region because of the varying dimensions of the more than 100 face images that were unsuccessful,
physical attribute images. yet the script managed to continue with the process
Essentially, one attribute image within the grid and collect other potentially successful samples. One
was considered as a cell, and after randomly selecting interesting point to note is that there were a
a grid page, a random cell was chosen by moving the significantly larger number of unsuccessful images of
mouse cursor to the center of the cell. In order to faces than there were of bodies, which totaled about
confirm that the image has enlarged and that there 10-20 errors at most. This is primarily because the
was an image located at the cursor coordinate, a method for adjusting the avatar’s face was more
yellow box with the text “Selected,” which appears in problematic than that of the avatar’s body. Adjusting
the bottom right corner of the image, was scanned on the avatar’s head relied on clicking ‘rotate left arrow’
the screen. Other cases were also resolved, such as icons, which were often ignored by the server. Some
the issue when the script failed to recognize that the of these major problems are longer load times or
mouse was hovering over the image given the default Sikuli’s failure to recognize loading signals.
waiting period. Once the selected box is confirmed, The source of most of these errors was primarily
the script checks to see if there are variations for the due to server latency. Between every selection in the
attribute by scanning for an unselected gray arrow on avatar customization process, there is a variable
the right side of the image. If there were variations loading time and a loading box, which appears in the
for that attribute, the script would quickly rotate center of the screen. Surprisingly, the loading time
through the variations given a random number of differs for each attribute image applied to the avatar,
clicks. It is worth noting that viewing images on each not necessarily differing for one type of physical
page or the variations through rotation respond attribute. For example, the loading time for two
quickly, which suggests that these operations are run different but similarly categorized physical attributes
client-side. Once the random image is selected, the would vary. One could conjecture that the amount of
script would proceed to the next tab. Because of the polygons, depending on the attribute, added upon the
similarity of the Eyes, Hairs, and Clothing tab with avatar may cause the variable loading time.
the Skin tab, the same method of collection was used These problems haven’t occurred very frequently,
for each. but their potential occurrences require hourly
After the random hair selection, an image of the checkups else the script could fail if left unchecked.
avatar’s face would be captured in PNG format on a Overnight data collection was risky because of the
preview window, opened by the zoom button in the lack of checkups, but in the cases that they were
preview. The resolution used to capture the image successful, a partial number of defunct images were
depended on the region dimensions. The avatar was present in the dataset. For example, previews of
rotated once toward the left side in order to capture a avatars that still have the loading text in the center of
direct face-to-face image of the avatar. For the Body the preview.
tab, the same method of image selection used in the However, the other problem that exists is
Face tab was also applied in this tab because of the executing the script on a different computer. Because
similarity between the interface of the Body tab and of the differing monitor resolutions, adjustments to
the Face tab. the program were required. The minimum changes
After the images for the clothing tab, which made were of the regions under the each tab because
contains sub-tabs of the top, bottom, and shoe tabs, each has a defined upper left coordinate that is
the image of the avatar’s body was captured. correct only in the previous computer’s resolution.
However, instead of the default position, which is a The coordinate as a result must be redefined because
slightly angled arms out position, a face-to-face stand it would throw off the entire collection process.
position was chosen in the dropdown box in the
zoomed preview window. Finally, the script would CONCLUSION
select “New Avatar” located near the top right hand
corner of the web page in order to repeat the process. Besides the unsuccessful captured images, there
were very rare occurrences where abnormal
RESULTS morphing would occur (Figure 6). However, server
latency was a higher concern, since images were
The program was run as continuously as possible often captured too early (Figure 5 and 8), because the
over the course of seven days, and approximately loading text image wasn’t detectable when it was
1030 successful images were captured, more of overlaid in the center of the avatar preview. As a
which were of the avatar body than the avatar face. result, the program would instead rely on detecting
The initial days were primarily test runs that were when the black area within the loading box
disappears before undertaking any action. However,
instructions to rotate the avatar head weren’t
recognized, and as a result most unsuccessful
captured face images were in the default angle
(Figure 3). Comparatively, changing body positions
were mostly recognized by the server resulting with
more successful body images (Figure 7) than face
images (Figure 4).
Collected avatar images from previous research
are of relatively lower quality than of those in this
data set. This is primarily due to the implementation
of low graphics requirements to open the virtual
community to as many online users without hardware
being a limiting factor. Although low graphics result
with basic avatar models, simple facial expressions
can still be easily noticed and identified (Parke 1972). Figure 3. Unsuccessful Captured Face Image - Default
In more complex avatar models, such as those in this Angle Unchanged
data set, there are thousands of more polygons that
allow more complex expressions to be shown.
Furthermore, gathering the data from virtual
communities is incredibly tedious, considering there
are a number of other problems that exist, such as
involuntary self-movement of the avatar (Oursler,
Price, and Yampolskiy 2009). For example, the head
or eyes of the avatar would always shift in place, and
often times the captured image wouldn’t be in the
position displayed by the avatar in Figure 2.
Considering comparing Figures 2 and 4, the images
in this data set are of higher quality and of larger
resolution.
Overall, this dataset could be repurposed not only
for imaged-based CATPCHA tests, but for facial
expression recognition and analysis. With the
addition of thousands of polygons, creating more
Figure 4. Successful Captured Face Image
complex facial expressions is possible. As such,
biometric analysis could be performed on such
avatars as a precursor to the analysis of real user
biometric data.
Figure 5. Unsuccessful Captured Face Image
Figure 2. Avatar image captured from Second Life - Loading Box in Center
References
Boulay, Jacques-Andre. (2006) "Modeling: Evolver
Character Builder." Virtual Reality News and
Resources By/for the New World Architects.
http://vroot.org/node/722. Retrieved on February
21, 2012.
Bursztein, Elie, Matthieu Martin, and John C. Mitchell.
(2011) "Text-based CAPTCHA Strengths and
Weaknesses” ACM Conference on Computer and
Communications Security (CSS’2011). October
Figure 6. Unsuccessful Captured 17-21, 2011. Chicago, USA.
Image – Abnormal Morph
Bursztein, Elie, Steven Bethard, Celine Fabry, John C.
Mitchell, and Dan Jurafsky. "How Good Are
Humans at Solving CAPTCHAs? A Large Scale
Evaluation. " 2010 IEEE Symposium on Security
and Privacy. pp. 399-413. May 16-19, 2010.
Berkeley, CA.
Geetha, Dr. G., and Ragavi V. "CAPTCHA Celebrating Its
Quattuordecennial – A Complete
Reference." IJCSI International Journal of
Computer Science Issues 2nd ser. 8.6 (2011): 340-
49. Print.
Lawton, George. (2010) "Screen-Capture Programming:
What You See Is What You Script. "Computing
Figure 7. Successful Captured Now. IEEE Computer Society, Mar. 2010.
Body Image Retrieved on February 22, 2012.
.
Oursler, Justin N., Mathew Price, and Roman V.
Yampolskiy. Parameterized Generation of Avatar
Face Dataset. 14th International Conference on
Computer Games: AI, Animation, Mobile,
Interactive Multimedia, Educational & Serious
Games (CGames’09), pp. 17-22. Louisville, KY.
July 29-August 2, 2009.
Parke, Frederick I. "Computer Generated Animation of
Faces." ACM '72 Proceedings of the ACM
Annual Conference. Vol. 1. NY: ACM New York,
1972. 451-57. Print.
Figure 8. Unsuccessful Captured Body
Image – Loading Box in Center