Automated Collection of High Quality 3D Avatar Images James Kim, Darryl D’Souza, Roman V. Yampolskiy Computer Engineering and Computer Science University of Louisville, Louisville, KY jhkim012@louisville.edu, darryl.dsouza@louisville.edu, roman.yampolskiy@louisville.edu Abstract CAPTCHAs are security tests designed to allow of factors that determine a CAPTCHA’s insolvability users to easily identify themselves as humans; however, as to computers and the level of ease for a human to research shows (Bursztein et al. 2010) these test aren’t solve the test. necessarily easy for humans to pass. As a result a new test, Other simpler more novel and user friendly types which requests users to identify images of real humans among those of 3D virtual avatars, is proposed that would of CAPTCHAs do exist, such as the Animal Species create a potentially unsolvable obstacle for computers and a Image Recognition for Restricting Access (ASIRR) quick, easy verification for humans. In order to provide test or the ‘drag and drop’ CAPTCHA (Geetha and cases for this new test, an automated bot is used to collect Ragavi 2011). However text-based CAPTCHAs are images of 3D avatars from Evolver.com. more widely used than sound or image based tests because of the numerous variations of distortion INTRODUCTION methods to increase the level of insolvability to computers, despite the added difficulty for humans to Today’s commonly used security test pass them (Geetha and Ragavi 2011). Each test has distinguishing real humans from bots on the Internet its own design flaws, thus security tests that present is CAPTCHA. Two noticeable issues result from the highest level of insolvability to computers and are using such tests. One is that these tests are gradually easiest to solve for humans are sought. becoming less reliable as exposed design flaws and Computers have the potential to detect text from a improved algorithms successfully crack CAPTCHAs. variety of distorted images, but their ability to For example, a CAPTCHA cracking program called differentiate between two objects in different colors deCAPTCHA, which was developed by a group of or textures is a different matter entirely. Considering Stanford researchers, successfully decrypted image this as a foundation, a test was proposed that would and audio-based text CAPTCHAs from a number of create a small table using images of virtual avatars famous sites. The success rate of passing and real humans and ask the user to select those that CAPTCHAs varies from site to site, but some are of real humans. Generally, the test would include examples include a 66% success rate on Visa’s two or more images of real humans in order to Authorize.net, 43% on Ebay, and 73% on increase the difficulty for computers. captcha.com (Bursztein, Martin, and Mitchell 2011). In addition, the colors, textures, and other factors Each website uses a combination of different text of the images would be variable in order to create as distortion factors in their CAPTCHAs; however, it few viable variables as possible. For example, the was evident that design flaws were present in most images of virtual avatars or humans may be black and tests. For example, the deCAPTCHA program was white, pastel, or even sketched. Furthermore, various able to achieve a 43% success rate because the clothing and accessories would potentially add more researchers were able to exploit the fact that eBay’s to the challenge. Inevitably, the test would require an CAPTCHAs were using a fixed number of digits and algorithm that demands critical thinking, one that regular font sizes (Bursztein, Martin, and Mitchell answers the question “How does a real human 2011). appear?” Second, according to research, humans encounter In order to begin testing, a dataset of high quality difficulties in solving CAPTCHAs, especially audio- 3D virtual avatars was required. Virtual communities CAPTCHAs. Perhaps the tests are becoming more such as Second Life or Active Worlds provided an challenging for the human users than they are for extensive selection of body characteristics and computers, considering deCAPTCHA’s performance articles of clothing for avatar creation. The tradeoff (Bursztein et al. 2010). This test is designed to be an for extensive customization was the detail or quality unsolvable obstacle for computers while remaining of the avatars. Considering that both these virtual easy for humans. One may even hypothesize that communities in particular have massive online there may be a correlation between the combination communities of users, it’s understandable that the system requirements to run the client application are METHODOLOGY very low since hardware costs are often a barrier to accessing online content. The Evolver avatar customization interface is Evolver, which is a 3D avatar generator that slightly more unique in that it prioritizes displaying allows users to create and customize their own avatar, explicit details of a particular attribute than was found to be the best choice. The avatars were emphasizing the overall variety in a section. This highly detailed and elaborate with options to orient visual design explains why when a mouse hovers them into certain body positions, facial expressions, over an attribute; the image automatically enlarges so or animations. Although Evolver did not possess the that the user would have a better visual. Some extensive selection of avatar characteristics as the physical attributes with variations, such as varying other virtual words, its selections were large enough colors or patterns, show gray arrows on each side of for our purposes. the image when enlarged, which allows the user to In addition, one noticeable difference between rotate through the variations instead of viewing all of Evolver and the other virtual worlds was the option to them in a grid on a reloaded webpage. morph different physical attributes together. Evolver At its execution, the script would randomly click uses a large database of pre-generated body-parts either a male or female gender option to begin the called a virtual gene pool to morph together two avatar creation process. Once the webpage loaded the physical attributes in morph bins. Users choose the Face tab, the script would make sure that both degree of which they are morphed together by using a checkboxes for male and female physical attributes slider, which measures the domination of one were selected so as to open more choices for a truly attribute over the other (Boulay 2006). random avatar. Seven icons were present on the left The collection of avatar images was collected side, each of which focused on a specific area of the using Sikuli, which is an image recognition based face (Figure 1). In each area, a slider exists to enable scripting environment that automates graphic user the user to control the degree of attribute morphing. interfaces (Lawton 2010). Sikuli’s integrated Initially, the random die button is clicked first to development environment features an image obtain two random face presets, and the slider would capturing tool that allows the users to collect the be dragged to a random location on the line to images required to fulfill the parameters of unique correlate a random degree of morphing. The script functions. would click 65 pixels down from the center of the A Sikuli script could be run continuously, but first selected icon in order to randomize the degree of there is always some factor, such as server latency, morphing for each area. that could cause an error, which would cause a major The script would proceed to the Skin tab on which setback to data collection as the script would there is a grid that showed all the physical attributes terminate. As a result, Evolver’s simple and easily for the skin. Under each tab there is a specific navigable user interface was another reason why number of pages of attributes. As a result, the script Evolver was selected over Second Life, ActiveWorlds, would randomly click through the pages given the and other virtual communities. Evolver’s interface range of the pages, and would randomly select an divides physical attributes and articles of clothing image “cell” within the grid. Under most of the tabs into organized tabs, which would decrease the that share a similar grid design, there would be a possibility of failure when moving from one section region that the script would act within to select an to another. image. The region is the area defined by the x-y ActiveWorlds in particular relied on scrolling, coordinate of the top left corner, and the dimensions which at times significantly impeded data collection because the possibility of failure would increase when the script failed to scroll down to continue with the script. Generally, few images were able to be generated without the Sikuli script misrecognizing certain images. However, in any case of data collection of avatar images from the web, server latency is a huge factor in successful collection as it could delay the timing for actions dictated in the script. Figure 1. Face Tab(Evolver.com) of the rectangle. Each tab had a differently sized used to find and resolve unforeseen bugs. There were region because of the varying dimensions of the more than 100 face images that were unsuccessful, physical attribute images. yet the script managed to continue with the process Essentially, one attribute image within the grid and collect other potentially successful samples. One was considered as a cell, and after randomly selecting interesting point to note is that there were a a grid page, a random cell was chosen by moving the significantly larger number of unsuccessful images of mouse cursor to the center of the cell. In order to faces than there were of bodies, which totaled about confirm that the image has enlarged and that there 10-20 errors at most. This is primarily because the was an image located at the cursor coordinate, a method for adjusting the avatar’s face was more yellow box with the text “Selected,” which appears in problematic than that of the avatar’s body. Adjusting the bottom right corner of the image, was scanned on the avatar’s head relied on clicking ‘rotate left arrow’ the screen. Other cases were also resolved, such as icons, which were often ignored by the server. Some the issue when the script failed to recognize that the of these major problems are longer load times or mouse was hovering over the image given the default Sikuli’s failure to recognize loading signals. waiting period. Once the selected box is confirmed, The source of most of these errors was primarily the script checks to see if there are variations for the due to server latency. Between every selection in the attribute by scanning for an unselected gray arrow on avatar customization process, there is a variable the right side of the image. If there were variations loading time and a loading box, which appears in the for that attribute, the script would quickly rotate center of the screen. Surprisingly, the loading time through the variations given a random number of differs for each attribute image applied to the avatar, clicks. It is worth noting that viewing images on each not necessarily differing for one type of physical page or the variations through rotation respond attribute. For example, the loading time for two quickly, which suggests that these operations are run different but similarly categorized physical attributes client-side. Once the random image is selected, the would vary. One could conjecture that the amount of script would proceed to the next tab. Because of the polygons, depending on the attribute, added upon the similarity of the Eyes, Hairs, and Clothing tab with avatar may cause the variable loading time. the Skin tab, the same method of collection was used These problems haven’t occurred very frequently, for each. but their potential occurrences require hourly After the random hair selection, an image of the checkups else the script could fail if left unchecked. avatar’s face would be captured in PNG format on a Overnight data collection was risky because of the preview window, opened by the zoom button in the lack of checkups, but in the cases that they were preview. The resolution used to capture the image successful, a partial number of defunct images were depended on the region dimensions. The avatar was present in the dataset. For example, previews of rotated once toward the left side in order to capture a avatars that still have the loading text in the center of direct face-to-face image of the avatar. For the Body the preview. tab, the same method of image selection used in the However, the other problem that exists is Face tab was also applied in this tab because of the executing the script on a different computer. Because similarity between the interface of the Body tab and of the differing monitor resolutions, adjustments to the Face tab. the program were required. The minimum changes After the images for the clothing tab, which made were of the regions under the each tab because contains sub-tabs of the top, bottom, and shoe tabs, each has a defined upper left coordinate that is the image of the avatar’s body was captured. correct only in the previous computer’s resolution. However, instead of the default position, which is a The coordinate as a result must be redefined because slightly angled arms out position, a face-to-face stand it would throw off the entire collection process. position was chosen in the dropdown box in the zoomed preview window. Finally, the script would CONCLUSION select “New Avatar” located near the top right hand corner of the web page in order to repeat the process. Besides the unsuccessful captured images, there were very rare occurrences where abnormal RESULTS morphing would occur (Figure 6). However, server latency was a higher concern, since images were The program was run as continuously as possible often captured too early (Figure 5 and 8), because the over the course of seven days, and approximately loading text image wasn’t detectable when it was 1030 successful images were captured, more of overlaid in the center of the avatar preview. As a which were of the avatar body than the avatar face. result, the program would instead rely on detecting The initial days were primarily test runs that were when the black area within the loading box disappears before undertaking any action. However, instructions to rotate the avatar head weren’t recognized, and as a result most unsuccessful captured face images were in the default angle (Figure 3). Comparatively, changing body positions were mostly recognized by the server resulting with more successful body images (Figure 7) than face images (Figure 4). Collected avatar images from previous research are of relatively lower quality than of those in this data set. This is primarily due to the implementation of low graphics requirements to open the virtual community to as many online users without hardware being a limiting factor. Although low graphics result with basic avatar models, simple facial expressions can still be easily noticed and identified (Parke 1972). Figure 3. Unsuccessful Captured Face Image - Default In more complex avatar models, such as those in this Angle Unchanged data set, there are thousands of more polygons that allow more complex expressions to be shown. Furthermore, gathering the data from virtual communities is incredibly tedious, considering there are a number of other problems that exist, such as involuntary self-movement of the avatar (Oursler, Price, and Yampolskiy 2009). For example, the head or eyes of the avatar would always shift in place, and often times the captured image wouldn’t be in the position displayed by the avatar in Figure 2. Considering comparing Figures 2 and 4, the images in this data set are of higher quality and of larger resolution. Overall, this dataset could be repurposed not only for imaged-based CATPCHA tests, but for facial expression recognition and analysis. With the addition of thousands of polygons, creating more Figure 4. Successful Captured Face Image complex facial expressions is possible. As such, biometric analysis could be performed on such avatars as a precursor to the analysis of real user biometric data. Figure 5. Unsuccessful Captured Face Image Figure 2. Avatar image captured from Second Life - Loading Box in Center References Boulay, Jacques-Andre. (2006) "Modeling: Evolver Character Builder." Virtual Reality News and Resources By/for the New World Architects. http://vroot.org/node/722. Retrieved on February 21, 2012. Bursztein, Elie, Matthieu Martin, and John C. Mitchell. (2011) "Text-based CAPTCHA Strengths and Weaknesses” ACM Conference on Computer and Communications Security (CSS’2011). October Figure 6. Unsuccessful Captured 17-21, 2011. Chicago, USA. Image – Abnormal Morph Bursztein, Elie, Steven Bethard, Celine Fabry, John C. Mitchell, and Dan Jurafsky. "How Good Are Humans at Solving CAPTCHAs? A Large Scale Evaluation. " 2010 IEEE Symposium on Security and Privacy. pp. 399-413. May 16-19, 2010. Berkeley, CA. Geetha, Dr. G., and Ragavi V. "CAPTCHA Celebrating Its Quattuordecennial – A Complete Reference." IJCSI International Journal of Computer Science Issues 2nd ser. 8.6 (2011): 340- 49. Print. Lawton, George. (2010) "Screen-Capture Programming: What You See Is What You Script. "Computing Figure 7. Successful Captured Now. IEEE Computer Society, Mar. 2010. Body Image Retrieved on February 22, 2012. . Oursler, Justin N., Mathew Price, and Roman V. Yampolskiy. Parameterized Generation of Avatar Face Dataset. 14th International Conference on Computer Games: AI, Animation, Mobile, Interactive Multimedia, Educational & Serious Games (CGames’09), pp. 17-22. Louisville, KY. July 29-August 2, 2009. Parke, Frederick I. "Computer Generated Animation of Faces." ACM '72 Proceedings of the ACM Annual Conference. Vol. 1. NY: ACM New York, 1972. 451-57. Print. Figure 8. Unsuccessful Captured Body Image – Loading Box in Center