Image Annotation Through Gaming Lasantha Seneviratne and Ebroul Izquierdo Multimedia and Vision Lab, Queen Mary, University of London,. Mile End Road, E1 4NS, London, UK. {lasantha.s, ebroul.izquierdo}@elec.qmul.ac.uk ABSTRACT We introduce an interactive framework for image level rather than consideration of the whole scene. We used a understanding, a game that is enjoyable and provide valuable very simple and effective approach called ‘elementary image annotations. When people play the game, they provide building elements of imagers’ or simply image blocks useful information about contents of an image. In reality the technique [3]. The advantage of this approach is that it most accurate method to describe the content of an image is distinguishes the background from an object. Those image manual labelling. Our approach is to motivate people to label blocks are more related to objects thus objects are represented imagers while entertaining themselves. Therefore if this game by a number of image blocks. Therefore we use object base becomes popular it will be able to annotate most imagers on image representation techniques in our framework. the web within a couple of months. When considering accuracy we use a combination of computer vision techniques This paper is organised as follows, section 2 describes the to secure the accuracy of image labelling. By doing this we general view of the system; section 3 describes the believe our system will make a significant contribution to performance measure and the paper ends with the conclusions address the semantic gap in the computer vision sector. and future works in section 4. 1 INTRODUCTION 2 GENERAL VIEW AND ARCHITECTURE Object recognition and semantic concepts in images is a main We call our system ‘Tag4Fun’; an interactive game designed research topic in the computer vision sector. There are using a 3D graphic library call OpenGL. This game is billions of imagers on the web; retrieving those using high- designed to be played by a single player and meant to be level semantic concepts is not accurate enough yet. Low-level played by a large number of players. The goal of the system is feature extraction techniques are able to determine the to annotate imagers according to their contents. difference and distribution between colours, textures etc; but the gap between low-level features and high-level concepts is The image that requires annotation will be displayed by the an open issue. Over the last decade problems related to Tag4Fun game. The game player comments on the contents semantic gap have driven the research into several directions. of the image. The basic game structure is similar to a well The web base ESP game [1] is an “out the box” approach that known game ‘Tetris’. The major difference in Tag4Fun is it provides an appealing way to annotate images. The idea uses characters instead of different shapes of building blocks. behind the game is to label images on the web according to To speed the annotating process Tag4Fun uses three columns their visual integrity. As mentioned the most precise way to of moving characters. The 3D characters move from top to describe the image integrity is manual annotation. bottom on the screen, the player is intended to collect them Considering billions of imagers are on the web this technique using the keyboard. For interactive purposes, Tag4Fun is more costly and impractical. generates random magic characters which are subject to change to any character. The game player has to construct the The main objective of this paper is to present an interactive key-word related to the contents of the image by collecting approach to annotate imagers using manual labelling. In order individual characters. The collected characters are used to to reduce the cost of manual annotation we introduce a highly select the pre-trained classifier for image classification enjoyable framework. When considering the label validity we purposes thus improving the label accuracy. use different combinations of techniques to increase the accuracy. This includes both psychological and computer The Tag4fun visual game will entertain and motivate the vision techniques. When considering the psychological player and provide valuable key-words about what is behaviour we use some simple techniques to clarify the user contained within the image. At the same time it helps to attitude. By doing so we were able to find whether the player determine the users’ attitude by feeding imagers using 3 is a cheater or not and treat them differently. At the same time different databases called none-annotated, partially-annotated we use computer vision techniques to increase the accuracy of and fully-annotated. The game player will be fed randomly by labels. The goal is to classify an image according to user key- all 3 databases. Therefore if the game player tries to annotate word quarry and annotate them. In real world applications a fully annotated image using non related key-words the images may represent a scene that may contain a number of system will identify them as cheaters and treat them only with objects. We therefore required an annotation within an object partially annotated imagers. The key-words generated by those players won’t be used for any labelling. classification; this helps us to increase the accuracy of annotation and in turn minimise cheating. 3 PERFORMANCE MEASURE These types of games depend on the physiological behaviour of the game player. Therefore it is extraordinarily difficult to measure the performance of Tag4Fun unless it is being played by a large number of users. As an ongoing project Tag4Fun is not yet ready for commercialization. Its performance was analyzed in two different ways. First of all we analyzed the Figure 2: Tag4Fun game framework performance of the classification process for three different concepts and secondly we analyzed the performance of the 2.1 When is an image annotated? complete frame work. For testing purposes the classifier was trained for three concepts using 10 images, equating to 320 When the classifier agrees on an image it will be temporally image blocks. The concepts used are butterfly, tree and annotated and the player will get a certain number of points to cougar. The Performance for the three concepts obtained is encourage them to continue playing. When an image passes displayed in table 1as follows. through Tag4Fun it contains a number of possible labels for it. If an image describes using the same label 5 times that key- word will be associated as a taboo word for the image and Precision Butterfly cougar Tree won’t allow players to use the taboo words for further CLD 45% 12% 65% labelling. If an image got 8 taboo words the image will fully DCD 30% 5% 40% annotate and be discarded from the database. All other information captured will be saved for future references. For EHD 45% 12% 40% integrity and language changes over time, a few months later Table 1: Performance of the SVM classifier fully annotated imagers will be loaded back for update purposes. (For example, George. W. Bush is the president of According to table 1, we conclude the performance of our the United States and will be the former president in the classifier is not as accurate as expected. Therefore we will future). keep working until we achieve a satisfactory result. However with such precision of the classifier we managed to get 71% 2.2 Low-level feature extraction accuracy for the complete framework. (The performance measured by using eight regular game players). Most image retrieval systems failed to produce satisfactory results especially when a user was interested in a particular 4 CONCLUSIONS AND FUTURE WORK object rather than whole scene. We therefore used a simple but effective technique called ‘elementary building elements We introduced a computer game which is able to encourage of imagers’, often called image blocks. This technique divides and motivate game players to annotate imagers manually. The the whole image into blocks of imagers. Image blocks proposed framework was tested with eight regular game represent different types of objects in the image, or players and the performance was acceptable. As an ongoing combinations of blocks represent a single object. We project we will develop and improve the whole system to extracted low level features from each block and classified achieve high accuracy of labels. We will also improve the them manually to create a vocabulary of training sets. The quality of our system according to physiological aspects of trainee model selection is directly related to the input quarry regular game players. (Key-word). Using the pre-trained models we were able to Future work will mainly focus on techniques for improving classify image blocks; because of the image block concept we the accuracy of annotation process and combining low-level find the block related to a particular object. Therefore in the features to improve the accuracy of the classification process. future we will use it as a benefit to help the game player interact by giving them a chance to collect more bonus points 5 REFERENCES when they point to the location of the object. This will allow us to test the user attitude for the second time and provide [1] Luis von Ahn and Laura Dabbish “ Labaling Imagers with more valuable information about the location of a particular a Computer Game” Pittsburgh,PA,USA 2004. object. [2] Luis von Ahn, Shiry Ginosar, Mihir Kedia, Ruoran Liu and Manuel Blum “Improving Accessibility of the Web with a The low level feature extraction was completed by using three Computer Game” Pittsburgh 2006. low-level descriptors, dominant colour descriptor (DCD), [3]Qianni Zhang; E.Izquierdo “Optimizing Metrics combining colour layout descriptor (CLD) and edge histogram descriptor Low-Level Visual Descriptors for Image Annotation and (EHD). Those descriptors are defined and used by MPEG-7 Retrieval”2006. (Moving picture experts group). Using those descriptors we [4] Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin “A trained a support vector machine (SVM) [4] classifier for Practical Guide to Support Vector Classification” 2008.