An OntologyBased Approach to TexttoPicture Synthesis Systems Dmitry Ustalov, Aleksandr Kudryavtsev Ural Federal University, Yekaterinburg, Russia dmitry@eveel.ru, vt@dpt.ustu.ru Abstract. In this paper, we present an ontologybased approach to texttopicture synthesis. Our approach operates with an ontology in the RDF/XML format. This provides loose coupling of the system compo- nents, unication of the interacting objects representation and their be- haviour, and makes possible verication of system information resources. Keywords: texttopicture, texttoscene, natural language process- ing, depiction rules, semantic representation, Web Ontology Language, Resource Description Framework. 1 Introduction A picture is worth of a thousand words. The texttopicture synthesis problem is actual because of existence of many domains where clearness of textual informa- tion is necessary: foreign language learning [12], trac accident visualization [1], rehabilitation of people with cerebral injuries [4], etc. Utkus [10] is a texttopicture synthesis system (TTP system) that is de- veloped since 2011 and is designed to work with small texts of 13 Russian sentences: fragments from children literature, microblog posts, news summaries, comments on Websites. These texts are suitable for automatic processing and further visualization [13]. The current work diers from previous TTP systems in its focus on conveying the gist of general, semantically unrestricted Russian language text. TTP systems have three stages of processing [2]: 1. A stage of linguistic analysis  tokenization, morphological and syntactic parsing, obtaining the semantic representation of the input text; 2. A stage of depictors generation  generation of the set of graphical depictors that corresponds with obtained semantic representation; 3. A stage of picture synthesis  construction of vector or raster image from the graphical primitives that are positioned following the generated depictors. In TTP systems, every processing stage strongly depends on many informa- tion resources, including:  Thesaurus that containts words and their relations (synonymy, hyponymy, etc); An OntologyBased Approach to TexttoPicture Synthesis Systems 95  Gallery that contains dierent graphical primitives for interacted objects (actors), which becomes rendered in the nal images;  Depiction rules that dene how one or many actors can be depicted into the nal images;  Frames, which describe allowed properties of actors. The volume and complexity of these resources are high. Therefore, TTP sys- tems must have a straight way to connect such resources during the text pro- cessing. 2 Related works There are several fullfunctional analogues that are described in various pa- pers [1, 2, 4, 5, 11, 13]. Unfortunately, an approach to unication the information resources is presented only in [2]. That paper presented the WordsEye system, which builds 3D scenes by with certain descriptive English sentences, e.g., The huge head is on the tan horse. The horse is on the extremely tall mountain range. The fence is 10 feet behind the horse. The fence is 50 feet long. The following desicions are made in the WordsEye system: 1. WordNet thesaurus is used to identify the semantic relations between sepa- rate words; 2. During the text processing, specially dened frames are mapped into the found syntactic groups to obtain additional information about actors: colour, size, etc; 3. Behaviour that is implemented in known actions (verbs), and is described in depiction rules, which are dened in a declarationstyle Lisp program; 4. A proprietary Izware Mirai 3D animation system is used with Viewpoint Model Library to perform visualization problem. Note two signicant drawbacks of these decisions: 1. Despite of rich possibilities of the Lisp programming language, usage of this language complicates the replenishment of depiction rules set because of high requirements of developers experience; 2. Work in 3D demands considerable eorts and resources, which are not justi- ed by nal quality: in most cases, it is enough to deal with 2D images [12]. 3 Suggested approach Similarly to [2], we consider actors in terms of object paradigm: 1. Actors have properties: colour, etc; 2. Actors have methods: functions that reect actors relations: to fall, to lay, etc. 96 D.Ustalov et al. Fig. 1. Connection of ontology, thesaurus, and gallery Fig. 2. Connection of ontology, thesaurus, and depiction rules We propose to formalize into an ontology all the knowledge about actors: their possible characteristics and relations. We also propose to split the ontol- ogy, thesaurus, depiction rules and gallery to provide loose coupling of these components of the TTP system (Fig. 1, 2):  Words and their semantic relations are represented in a thesaurus; An OntologyBased Approach to TexttoPicture Synthesis Systems 97  Several gures from gallery can be associated with each word in thesaurus: despite words tomcat and cat are antonymes by gender, they both are hy- ponymes to word animal ;  Ontology has the class Actor, and instances of this class are linked to synsets in thesaurus. Therefore, for every set of synsets an Actor instance can be dened by correspondent properties;  There are dened object properties for Actor instances. These object proper- ties are associated with verb synsets in thesaurus and represent all possible relations among actors (i.e., fall(actor) and fallTo(actor1 actor2));  Also, there data properties are dened and represent dierent parameters of actors (e.g., colour);  Depiction rules that specify the behaviour of each object property (Fig. 2) are dened in a separate XML document. Elements of ontology are linked to thesaurus synsets using the OWL annota- tion mechanism. It is important to note that one element can be linked to many synsets. These synsets can belong to thesauri of dierent language because of internationalization method that is implemented in OWL. 3.1 Examples The Actor class is a direct subclass of the Thing class: Instances of the Actor class (Fig. 3) can be linked to synsets using OWL annotations: 2039 2040 238 6939 75 Object properties are also dened for the Actor class, and they represent all the predicates that are operated by the system. Object properties are connected with depiction rules that specify their behaviour. Equivalence relation is possible between object properties. In our approach, the SPOtriples1 (fall man) and (fall man chair) will be attributed to dif- ferent object properties : fall and fallTo. 1 SPO  a tuple of predicate, subject, and object. 98 D.Ustalov et al. Fig. 3. Ontology fragment with Actor instances 106 397 406 106 397 406 To represent detected object properties on the nal picture, it is necessary to assign the specic behaviour to each known object property. This behaviour is specied by depiction rules which are declared in a separate XML document. For the fallTo object property we have the following depiction rule : An OntologyBased Approach to TexttoPicture Synthesis Systems 99 In this example, the subject and object of the predicate would be put together, and the subject will be diverted onto object. 4 Implementation The Utkus prototype under discussions was written on the Ruby programming language: 1. Link Grammar for Russian syntactic parser [6] is used because of its avali- bility and easy parseable format; 2. Only verb phrases and related noun phrases are extracted from the depen- dency tree of each sentence of the source text. These syntactic groups are mapped into the SPOtriplets; 3. Ontology is dened in the RDF/XML format using the Protege editor; 4. There are only synsets in our Russian dictionary [9]: no hyponyms, etc; 5. Gallery is composed by sprites from The Noun Project [7] collection. These sprites are cropped, rasterized, and associated with noun synsets; 6. Final rendering is performed using GD2 library in form of PNG raster images of 640 × 480. As example, there are four images that been generated by Utkus system. With a view of place economy, these images been cropped. These images (Fig. 4(a), 4(b), 4(c), 4(d)) are correspond to texts: 1. A man has fallen into the re2 ; 2. Several houses3 ; 3. There are a man and a woman in the house4 ; 4. A certicate, a bear, a rain5 . It should be noted that Utkus system is unable to represent plural words (Fig. 4(b)) at this moment. 5 Conclusion We have presented the approach to organize the TTP system information re- sources. This approach provides loose coupling of ontology, thesaurus, gallery and depiction rules. Main advantages of this approach are: 2 ×åëîâåê óïàë â îãîíü, in Russian. 3 Íåñêîëüêî äîìîâ, in Russian. 4  äîìå íàõîäèëèñü ìóæ÷èíà è æåíùèíà, in Russian. 5 Àòòåñòàò, ìåäâåäü, äîæäü, in Russian. 100 D.Ustalov et al. (a) A man has fallen into the (b) Several houses re (c) A certicate, a bear, a rain (d) There are a man and a woman in the house Fig. 4. Depiction of the texts 1. Simplicity of development and modication all the information resources that are used by TTP system:  Ontology can be modied with any available ontology editor (e.g., Pro- tege);  Depiction rules can be edited with any text editor, or any XML editor;  Thesaurus and gallery data can be modied as any data in relational database (in our implementation, PostgreSQL is used). 2. RDF/XML ontology allows one to reuse these resources in other applications and domains; 3. Verication instruments (such as inference systems) can help us to control the quality of information resources. Figures 4(a), 4(b), 4(c) are produced during testing our Utkus TTP system under development. The Utkus TTP system is based on this approach. An OntologyBased Approach to TexttoPicture Synthesis Systems 101 5.1 Future Work We have several reasons for future work: 1. To switch to the fullfeatured thesaurus to unify the thesauri resources (e.g., Russian WordNet [8]); 2. To enhance the linguistic analysis subsystem to handle such parts of speech as adjectives, pronouns, numerals, etc; 3. To solve the problem of predicate ambiguation [3] when generating the se- mantic representation; 4. To perform experiments on the Utkus prototype and make changes in the system components, if necessary. Acknowledgements. Authors would like to thank the Institute of Mathematics and Mechanics UrB RAS for the provided computer equipment. References 1.  Akerberg, O., Svensson, H., Schulz, B., Nugues, P.: CarSim: an automatic 3D text-to-scene conversion system applied to road accident reports. In: Proceedings of the 10th Conference on European Chapter of the Association for Computa- tional LinguisticsVolume 2. pp. 191194. Association for Computational Linguis- tics (2003) 2. Coyne, B., Sproat, R.: Wordseye: an automatic text-to-scene conversion system. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques. pp. 487496. ACM (2001) 3. Fomichov, V.: A comprehensive mathematical framework for bridging a gap be- tween two approaches to creating a meaning-understanding web. International Journal of Intelligent Computing and Cybernetics 1(1), 143163 (2008) 4. Goldberg, A., Rosin, J., Zhu, X., Dyer, C.: Toward text-to-picture synthesis. In: NIPS 2009 Mini-Symposia on Assistive Machine Learning for People with Disabil- ities (2009) 5. Li, H., Tang, J., Li, G., Chua, T.: Word2image: Towards visual interpretation of words. In: The 16th ACM International Conference on Multimedia (2008) 6. Link Grammar for Russian, http://slashzone.ru/parser/ 7. NounProject, http://thenounproject.com 8. Russian Wordnet, http://www.wordnet.ru 9. Russian Language Dictionaries, http://speakrus.ru/dict/index.htm 10. Utkus, http://utkus.eveel.ru 11. Yamada, A., Yamamoto, T., Ikeda, H., Nishida, T., Doshita, S.: Reconstructing spatial image from natural language texts. In: Proceedings of the 14th Conference on Computational LinguisticsVolume 4. pp. 12791283. Association for Compu- tational Linguistics (1992) 12. Yoshii, M., Flaitz, J.: Second language incidental vocabulary retention: The eect of text and picture annotation types. CALICO journal 20(1), 3358 (2002) 13. Zhu, X., Goldberg, A., Eldawy, M., Dyer, C., Strock, B.: A text-to-picture synthesis system for augmenting communication. In: Proceedings of the National Conference on Articial Intelligence. vol. 22, p. 1590. Menlo Park, CA; Cambridge, MA; Lon- don; AAAI Press; MIT Press; 1999 (2007)