=Paper=
{{Paper
|id=Vol-1587/T3-2
|storemode=property
|title=AMRITA-CEN@FIRE2015: Automated Story Illustration using Word Embedding
|pdfUrl=https://ceur-ws.org/Vol-1587/T3-2.pdf
|volume=Vol-1587
|authors=Sanjay S. P.,Nivedhitha Ezhilarasan,Anand Kumar M,Soman K P
|dblpUrl=https://dblp.org/rec/conf/fire/PEKS15
}}
==AMRITA-CEN@FIRE2015: Automated Story Illustration using Word Embedding==
<pdf width="1500px">https://ceur-ws.org/Vol-1587/T3-2.pdf</pdf>
<pre>
        AMRITA-CEN@FIRE2015: Automated Story Illustration
                   using Word Embedding

                Sanjay S.P                                Nivedhitha Ezhilarasan                      Anand Kumar M and Soman K P
         Centre for Excellence in                        Centre for Excellence in                         Centre for Excellence in
      Computational Engineering and                   Computational Engineering and                   Computational Engineering and
               Networking,                                     Networking,                                     Networking,
       Amrita Vishwa Vidyapeetham                      Amrita Vishwa Vidyapeetham                      Amrita Vishwa Vidyapeetham
       Ettimadai, Coimbatore. India                    Ettimadai, Coimbatore. India                     Ettimadai, Coimbatore. India
        sanjay.poongs@gmail.com                          e.nivedhitha@gmail.com                       m_anandkumar@cb.amrita.edu


ABSTRACT
Story books are copiously filled with image illustration in which the
illustrations are essential to the enjoyment and understanding of the
story. Often the photos themselves turn out to be more important
than content. In such cases, our principle job is to locate the best
pictures to show. Stories composed for kids must be improved with
pictures to manage the enthusiasm of a tyke, for words usually can't
do a picture justice. This system is built as a part of shared task of
Forum of Information Retrieval and Evaluation (FIRE)
2015 workshop. In this system we provide a methodology for
automatically illustrating a given Children’s story using the
Wikipedia ImageCLEF 2010 dataset, with appropriate images for
better learning and understanding.

Keywords
Automated Story Illustration; Story Picturing Engine; Image
Ranking; word-embedding, WordNet; Machine Learning; TF-IDF;
Image Retrieval;                                                                          Figure 1: An image generated by our engine


1. INTRODUCTION                                                                A sample output (Figure 1) for the story “THE FOX AND THE
A kid is touchy to pictures even before he/she can talk. This is not           CROW” is shown. An image corresponding to each entity is
shocking in the event that we think about that as an infant effectively        produced. Our novel story Illustration system automatically
recognizes its mom's face and outsiders. The kid's mom, sister,                generates a picture that aims to convey the gist of the content of
sibling and the outsider can all be viewed as living and moving                general natural language text.
pictures. In the same way, a kid will perceive a most loved toy or
pet.Stories are always preferred when they tag along with beautiful
images depicting the content. This clearly gives us the importance of          2. RELATED WORK
giving image Illustration in Children’s short stories. But we don’t
have many systems that can automatically convert any general
textual information into pictorial representation. This paper                  Our present work is inspired and triggered by a rich resource work
describes our system for FIRE 2015 Automated story Illustration                done prior to us. A similar system called story picturing engine, is
using the Wikipedia ImageCLEF 2010 dataset. This task focuses on               built using the techniques designed for content-based image retrieval
automatically illustrating the story with corresponding images                 and textual information retrieval [5]. In recent years, learned
thereby making reading and understanding better. One can                       statistical models have been widely used in linguistic indexing of
understand the core content of the story, just by looking at the               pictures [7]. Also image annotations can be modeled using latent
images.                                                                        Dirichlet allocation (LDA) [10].


                                                                               Here we try to analyze the research issues related to Image
                                                                               illustration mentioned in this section. The task of automatically

                                                                          67
generating the words that describe the picture is called Image                 <caption>
annotation. Image annotation is used for image search and retrieval            <comment>
applications [6]. Story Illustration, on the contrary, aims in
substituting the set of images that best describes the given text. In          <license>
the proposed work it is the story. So we can call these two problems
the inverse of each other. To rank pictures in for a given story, in
                                                                               Given a query, the search methodology retrieves relevant pictures by
this system, we have used an unsupervised algorithm. Our system is
                                                                               analyzing the image caption textual descriptions found adjacent to
entirely based on the concept of Word Embedding. Word embedding
                                                                               the image, and other text-related factors such as the ﬁle name of the
is a mapping of a word to an n-dimensional vector space. This real
                                                                               image which are already extracted and stored in model file. This
valued vector representation captures semantic and syntactic
                                                                               extraction process involves converting all the xml files that contains
features. Gensim is an open source tool for python which is used for
                                                                               information about the images into a text file named as “Text model
implementing the concept of word embedding. Our ranking scheme
                                                                               of image database”. Only necessary information about the image,
based on TFIDF using gensim. We have represented one ranked set
                                                                               from the XML file is filtered out.
of images for one entity. So when the entire story is queried, the list
of images that are produced can effectively produce the story line.
Other variations and implementation are explained in detail in
section 3.


3. SYSTEM ARCHITECHTURE

The task and working methodology is as follows:
In the development phase initially a set of five children’s short
stories with important entities and events, that needs illustration
were already provided. Our system provides one ranked list of                    Figure 2: An image depicting our preprocessing ImageCLEF
images corresponding to each important entity and event in a story.                                        dataset
At a later stage a set of 22 children’s short stories were given for
illustration. We have provided a unique image ranking methodology
that effectively computes the importance of each picture and outputs
a ranked list of images which aptly describes the story.                       3.2 Model Description
                                                                               The input data, story was in XML format and is pre-processed and
                                                                               converted into a simple “.TXT” named STORY.TXT file for fast
3.1 Dataset analysis                                                           searching of the data . This contains information about entity, event
                                                                               and the entire story. This is called story entity event block in the
In development data set input Query is constructed and annotated
                                                                               figure 3.1 .The whole set of information is passed to a extraction unit
with their label by using Python. It contains a set of five short
                                                                               where the important key words such as entity and events are
stories. The most important entities and events that effectively
                                                                               extracted .The Image extraction unit will search through the
summarize the story were already provided which reduced the
                                                                               MODEL.TXT files which is also called ‘Text model of image data
overhead of finding them. This information serves as the input to
                                                                               base block’. The extracted information is then stored for processing
query the image database to retrieve the pertinent image. In this task
                                                                               in the local variable which is referred as Local Data Base. The
we use the ImageCLEF Wikipedia Image Retrieval 2010 dataset.
                                                                               extracted text in the local data base is then given to the training
This dataset consists of 237,434 images along with their captions.
                                                                               model of TFIDF from GENSIM. It will create a word2vec features
Captions are available in English, French and/or German. Complete
                                                                               form the documents. These model files are created for each Entity in
language statistics, image files and their captions are found in
                                                                               the story which is called the model Block in the figure 3.1
the ImageCLEF website. Metadata are provided as a single
metadata.zip archive which is split into 26 directories (from 1 to 26).        The text extraction block will join the story, entity and events .The
“metadata/1” contains XML files from 0.xml to 9999.xml,                        expanded text is now passed to extract the hypernyms which will get
“metadata/2” contains images from10000.xml to 19999.xml etc.                   the example sentences of the hypernyms while the expanded text is
We have used only English dataset for developing our system. We                also passed to WSD block which will extract the sense of the entity
extracted important information such as caption, description                   used in the story and related example text is extracted from WORD
comments etc., from the database and created a file, which is then             NET. The extracted hypernyms and WSD text are added to the
queried to obtain the ranked list of images corresponding to each              extracted text this is called Text Expansion block. The expanded text
story. The extracted information is stored in a model file. Figure.2           is now passed as a query to the model file created by the GENSIM
depicts the procedure of how the information from ImageCLEF                    which will map the query and then rank all Images according to the
2010 is extracted.                                                             TFIDF weights. The weight of the Images will be in 0-1 scale then
The main components in the .xml files are:


 <Image>
< name >
<description>
<comment>

                                                                          68
                                                                                Figure 4: Algorithm describing the ranking methodology


                                                                            5. TOOLS AND METHODS
                                                                            Word embedding is a mapping of a word to a d-dimensional vector
                                                                            space. This real valued vector representation captures semantic and
                                                                            syntactic features. We have used gensim to implement this. For
                                                                            Vector Space modeling we have used Gensim toolkit. It is
                                                                            implemented in python and one can improve the performance using
                                                                            NumPy, SciPy etc. Efficient online algorithms are used in dealing
                                                                            with huge text data with the help of Gensim. Gensim has packages
                                                                            included for TF-IDF, latent semantic evaluation (LSA) and latent
                                                                            Dirichlet allocation (LDA), along with allotted parallel variations
      Figure 3: A block diagram showing the entire system                   random projections, Google's word2vec and document2vec
                                                                            algorithms, etc. It finds its application in commercial as well as
                                                                            academic areas [8]. We have imported gensim from NLTK. NLTK
it is mapped to 0-100 scale. The mapped images are ranked and then          is a major platform for building Python applications related to text
extracted images with the rank are converted to “.TREC” file which          analytics. Libraries for tokenization, stemming, tagging, parsing,
is then evaluated by the FIRE 2015. The output is also generated in         and much more are included in here [9].
the form of HTML page with the best mapped images illustrating the          We have filtered out important information from the story using the
story with pictures.                                                        concept of hypernym and hyponym. Hypernyms and hyponyms are
                                                                            semantic classes of words. Hypernyms are more broad in
                                                                            significance (hyper = “over”) and hyponyms are more particular
4. RANKING ALGORITHM                                                        (hypo = “under”). Let us try to understand the concepts by some
                                                                            examples.
                                                                            Example: color is a generalized term for all the colors. We call it the
The result from TFIDF is in a scale of 0-1. Images have to be ranked        hypernym. Purple, green, red, blue etc. are hyponyms of color.
based on the results obtained. We have divided the values into a 5          Figure 3
sub category form 0-4 based on below mentioned method. To
compute this, we first compute a value called Range. Once when the
range is computed, we assign the rank based on it. If the value is
below the range, we assign “0”. If the value is below twice the
range, then we assign rank to be “1”, for the range below thrice the
rank, we assign “2” and so on. This computation goes on till we
assign the rank “4”, which is the maximum possible rank. The
pseudocode is as follows:


                                                                                            Figure 5: Hypernym and hyponym


                                                                       69
6. EVALUATION AND RESULT                                                         of the Twelfth Conference on Computational Natural Language
                                                                                 Learning. Association for Computational Linguistics, 2008.
Evaluation is conducted by means of precision-at-K (P@K) and                 [4] Zhu, Xiaojin, et al. "A text-to-picture synthesis system for
mean average precision (MAP) in relation to manual relevance                     augmenting communication." AAAI. Vol. 7. 2007..
assessments. Each important entity or event in a story will have a           [5] Joshi, Dhiraj, James Z. Wang, and Jia Li. "The Story Picturing
relevance list associated with it. P@K and MAP for each annotation               Engine---a system for automatic text illustration." ACM
are computed against these relevance scores. There were two teams                Transactions on Multimedia Computing, Communications, and
that participated, including us. Our run is based on TFIDF. Our                  Applications (TOMM) 2.1 (2006): 68-89.
methodology has given better results when evaluated using MAP
                                                                             [6] Feng, Yansong, and Mirella Lapata. "Topic models for image
and B-pref.
                                                                                 annotation and text illustration." Human Language
                                                                                 Technologies: The 2010 Annual Conference of the North
              Table 1: Showing the evaluation results                            American Chapter of the Association for Computational
                                                                                 Linguistics. Association for Computational Linguistics, 2010.
Run Name        TFIDF-1       cguj-      cguj-       cguj-                   [7] Li, Jia, and James Z. Wang. "Automatic linguistic indexing of
                              run1       run2        run3                        pictures by a statistical modeling approach." Pattern Analysis
num_ret         6405          92         95          100                         and Machine Intelligence, IEEE Transactions on 25.9 (2003):
                                                                                 1075-1088.
num_rel         2068          2068       2068        2068
                                                                             [8] Řehůřek, R., and P. Sojka. "Gensim–Python Framework for
num_rel_ret     255           16         20          13
                                                                                 Vector Space Mo delling." NLP Centre, Faculty of Informatics,
MAP             0.0107        0.0047     0.0053      0.003                       Masaryk University, Brno, Czech Republic (2011).
MRR             0.1245        0.3708     0.2997      0.2504                  [9] Bird,      Steven.   "NLTK:    the   natural language
B-pref                        0.0074     0.0095      0.0065                      toolkit." Proceedings of the COLING/ACL on Interactive
                0.1241
                                                                                 presentation sessions. Association for Computational
P@5             0.0636        0.1273     0.1545      0.0909                      Linguistics, 2006.
                                                                             [10] Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent
                                                                                  dirichlet allocation." the Journal of machine Learning
7. CONCLUSION                                                                     research 3 (2003): 993-1022.


In the proposed work we have used TFIDF to generate a sequence of
images for the corresponding story. We have successfully
implemented a text-to- picture engine that can effectively understand
the core content of the story, and produce a set of images that best
represents the story. The results are displayed in a web page and for
evaluation purpose we have also generated a “.TREC” file. The
results were evaluated by FIRE team and it has given considerably
good accuracy. In future this can be extended to create gaming units,
generate animation based on the story, to educate mentally retarded
children an in the rehabilitation of brain-injured patients


AKNOWLEDGEMENT

We would like to thank Dr.Debasis Gangly and to Mr.Iacer Calixto,
ADAPT Centre, Dublin City University (DCU), and for FIRE2015
team for organizing such a great event and guiding us through the
entire journey.


4. REFERENCES

[1] Tomlinson, Carl M., and Carol Lynch-Brown. Essentials of
    children's literature. Allyn & Bacon, 1996
[2] “The Importance of Illustrations in Children’s Books” in
    Illustrating for Children edited by Mabel Segun. Ibadan:
    CLAN, 1988. pp 25-27
[3] Goldberg, Andrew B., et al. “Easy as ABC: facilitating pictorial
    communication via semantically enhanced layout.” Proceedings


                                                                        70

</pre>