=Paper=
{{Paper
|id=Vol-1587/T3-1
|storemode=property
|title=Overview of the Automated Story Illustration Task at FIRE 2015
|pdfUrl=https://ceur-ws.org/Vol-1587/T3-1.pdf
|volume=Vol-1587
|authors=Debasis Ganguly,Lacer Calixto,Gareth Jones
|dblpUrl=https://dblp.org/rec/conf/fire/GangulyCJ15
}}
==Overview of the Automated Story Illustration Task at FIRE 2015==
Overview of the Automated Story Illustration Task at FIRE 2015 Debasis Ganguly Iacer Calixto Gareth Jones ADAPT Centre ADAPT Centre ADAPT Centre School of Computing School of Computing School of Computing Dublin City University Dublin City University Dublin City University Dublin, Ireland Dublin, Ireland Dublin, Ireland dganguly@computing.dcu.ie icalixto@computing.dcu.ie gjones@computing.dcu.ie ABSTRACT that “a picture is worth a thousand words”1 . The “document In this paper, we describe an overview of the shared task expansion with images” methodologies, developed and eval- (track) carried out as part of the Forum of Information Re- uated on this dataset, can also be applied to augment other trieval and Evaluation (FIRE) 2015 workshop. The objec- types of text documents, such as news articles, blogs etc. tive in this task is to illustrate a passage of text automati- The illustration of children’s stories is a particular in- cally by retrieving a set of images and then inserting them at stance of the general problem of automatic text illustration, appropriate places in the text. In particular, for this track, an inherently multimodal problem that involves image pro- the text to be illustrated is a set of short stories (fables) cessing and natural language processing. A related problem for children. Some of the research challenges for the partic- to automatic text illustration is that of automatic textual ipants in developing an automated story illustrating system generation of image description. This problem is in fact involve developing techniques to automatically extract out under active research and has drawn significant research in- the concepts to be illustrated from a full story text, explore terests in recent years [2, 7, 4, 8]. how to use these extracted concepts for query representa- The rest of the paper is organized as follows. In Section tion in order to retrieve a ranked list of images per query, 2, we present a brief overview of the task objectives. In Sec- and finally investigating how merge the ranked lists obtained tion 3, we describe how the dataset (queries and relevance from each individual concept to present a single ranked list judgments) is constructed. Section 4 describes our own ini- of candidate relevant images per story. In addition to re- tial experiments so as to obtain our own baselines on the porting an overview of the approaches undertaken by two dataset constructed. Section 5 provides a brief overview of participating groups who submitted runs for this task, we the approaches undertaken by the participating groups and also report two of our own baseline approaches for tackling presents the official results. Finally Section 6 concludes the the problem of automated story illustration. paper with directions for future work. 2. TASK DESCRIPTION In order to share among researchers a dataset for text 1. INTRODUCTION augmentation with images, and to encourage them to use Document expansion, in addition to inserting text and hy- this dataset for research purposes, we are organizing a shared perlinks, can also involve adding non textual content such task, named “Automated Story Illustration”2 , as a part of as images that are topically related to document text, in or- the Forum of Information Retrieval and Evaluation (FIRE) der to enhance the readability of the text. For example, in 2015 workshop3 . The goal of this task is to automatically [3], Wikipedia articles are augmented with images retrieved illustrate children’s short stories by retrieving a set of images from the Kirklees image archive, where automatically ex- that can be considered relevant to illustrate the concepts tracted key concepts from the Wiki text passages were used (agents, events and actions) of a given story. to formulate the queries for retrieving the images. This au- In contrast to the standard keyword-based ad-hoc search tomatic augmentation of documents can be useful for vari- for images [1], there exists no explicitly user formulated key- ous purposes, such as enhancing the readability of text for word based queries in this task. Instead, each text passage children enabling them to learn and engage with the con- acts as an implicit query for which images need to retrieved tent more, for making it easier for medical students to learn to augment it. To illustrate the task output with an exam- more about a disease or its syndromes by looking at related ple, let us consider the story “The Ant and the Grasshopper” images, etc. shown in Figure 1. In the text we underline the key concepts that are likely to be used to formulate queries for illustrating The aim of our work, reported in this paper, is to build up the story. Additionally, we show a set of manually collected a dataset for evaluating the effectiveness of automated ap- proaches for document expansion with images. In particular, 1 http://en.wikipedia.org/wiki/A_picture_is_worth_ the problem that we address in the paper is that of augment- a_thousand_words ing the text of children’s short stories (e.g. fairy tales and 2 http://srv-cngl.computing.dcu.ie/ fables) with images in order to help improve the readabil- StoryIllustrationFireTask/ 3 ity of the stories for small children according to the adage http://fire.irsi.res.in/fire/ 63 IN a field one summer’s day a Grasshopper was hopping about, chirping and singing to its heart’s content. An Ant passed by, bearing along with great toil an ear of corn he was taking to the nest. “Why not come and chat with me, said the ImageCLEF collection. the Grasshopper, “instead of toiling and moiling in that way?” “I am helping In order to facilitate participants to concentrate on re- to lay up food for the winter,” said the Ant, “and recommend you to do the same.” “Why bother about winter?” said the Grasshopper; “we have got plenty trieval only, we manually annotated the short stories with of food at present.” But the Ant went on its way and continued its toil. When the winter came the Grasshopper had no food, and found itself dying of hunger, concepts that are likely to require illustration. Participants, while it saw the ants distributing every day corn and grain from the stores they had collected in the summer. Then the Grasshopper knew: “IT IS BEST TO volunteering for the annotation task, were instructed to high- PREPARE FOR THE DAYS OF NECESSITY.” light parts of the stories that they feel would better be un- derstood by children with the help of illustrative images. In total, we got five participants annotating 22 stories, three annotating 4 each and the rest two annotating 5 each. Each story was annotated by a single participant only. For other participants who want to automatically extract the concepts from a story for the purpose of illustration, we encouraged them to develop automated approaches and then compare their results with the manually annotated ones. A participating system may use shallow natural language pro- cessing (NLP) techniques, such as named entity recognition and chunking, to first identify individual query concepts and then to retrieve candidate images for each of these. Another approach may be to use the entire text as query and then to cluster the result-list of documents to identify the individual query components. An important component in an information retrieval (IR) Figure 1: The story of “The Ant and the Grasshopper” with dataset is the set of relevance assessments for a query. To a sample annotation of images from the web. Images were obtain the set of relevant images for each story, we under- manually retrieved with Google image search. The key terms take the standard pooling procedure of IR, where a pool of used as queries in Google image search are underlined in the documents, i.e. the set of top ranked documents from re- text. trieval systems with different settings, is assessed manually for relevance. The relevance judgements for our dataset are obtained as follows. images from the results of Google image search4 executed Firstly, in order to be able to search for images with ad- with each of these underlined phrases as queries. It can be hoc keywords, we indexed the ImageCLEF collection. In seen that the story with these sample images is likely to particular, the extracted text from the caption of each image be more appealing to a child rather than the plain raw text. in the ImageCLEF collection, was indexed as a retrievable This is because, with the accompanying images, the children document. The ImageCLEF collection was indexed with can potentially relate to the concepts described in the text, Lucene6 , an open source IR system in Java. e.g. the top left image shows a child how does a “summer Secondly, we make use of the manually annotated concepts day’s field” look like. as an individual query that is executed on the document col- lection of the ImageCLEF. To construct the pool, we obtain runs with different retrieval models, such as BM25, LM and 3. DATASET DESCRIPTION tf-idf with default parameter settings in Lucene and finally It is worth mentioning that we use Google image search in fuse the ranked lists with the standard COMBSUM merging our example of Figure 1 for illustrative purpose only. How- technique. ever, in order to achieve a fair comparison between auto- Finally, top 20 documents from this fused ranked list were mated approaches to the story illustration task, it is imper- then assessed for relevance. The relevance assessment for ative to build up a dataset comprised of a static document each manually annotated concept for each story was con- collection, a set of test queries (text from stories), and the ducted by the same participant who created the annotation relevance assessments for each story. in the first place. This ensured that the participants had The static image collection that we use for this task is the a clear understanding of the relevance criteria. The partic- ImageCLEF 2010 Wikipedia image collection released [6]. ipants were asked to assign relevance on a five point scale For the queries, we used popular children’s fairy tales since ranging from absolutely non-relevant to highly relevant. most of them are available in the public domain and freely distributable. In particular, we make use of 22 short stories collected from “Aesop’s Fables”5 . 4. OUR BASELINES The first research challenge for an automated story illus- In this section, we describe some initial experiments that tration approach is to extract the key concepts from the text we conducted on our dataset, meant to act as baselines for passages in order to formulate suitable queries for retrieving future work on this dataset. As our first baseline we simply relevant images, e.g. an automated approach should extract use all the words in a story to create a query. We then use “summer day field” as a meaningful unit for illustration. The this query to retrieve a list of images by making use of the second research challenge is to make use of these extracted similarity of the query with the caption texts of the images concepts or phrases to construct queries and perform re- in the index. The retrieval model that we use is the LM trieval from the collection of images, which in this cases is with Jelinek Mercer smoothing [5]. As a second baseline, we 4 still use all the words in the story but this time weight each https://images.google.com/ 5 6 https://en.wikipedia.org/wiki/Aesop https://lucene.apache.org/ 64 query term by its tf-idf score. It is worth mentioning here Grp Affiliation #members that the two baselines that we use are quite simple because 1 Amrita Vishwa Vidyapeetham, Coimbat- 3 our intention is to see how simple methods can perform, ore, India before attempting to apply more involved approaches for 2 i) Charotar University of Science and 4 this task. Technology, Anand, India; ii) L.D.R.P. College, Gandhinagar, India; iii) Gujarat Approach MAP P@5 P@10 University, Ahmedabad, India. Unweighted qry terms 0.0275 0.1048 0.0905 tf-idf weighted qry terms 0.0529 0.1714 0.1238 Table 2: Participating groups for FIRE Automated Story Illustration task 2015. Table 1: Retrieval effectiveness of simple baseline ap- proaches averaged over 22 stories. Grp Run Evaluation Metrics Id Id #ret #relret MAP MRR B-pref P@5 In Table 1, we observe that simply using all terms of a story as a query to retrieve a ranked list of images does not 1 1 6405 255 0.0107 0.1245 0.1241 0.0636 produce satisfactory results. In contrast, even a very simple 2 1 92 16 0.0047 0.3708 0.0074 0.1273 approach of weighting the terms in the text of the story by 2 2 95 20 0.0053 0.2997 0.0095 0.1545 their tf-idf weights can produce a significant improvement 2 3 100 13 0.0030 0.2504 0.0065 0.0909 in the results. We believe that shallow NLP techniques to extract useful concepts can further improve the results. Table 3: Official results of the FIRE Automated Story Il- lustration task 2015. The evaluation measures are averaged 5. SUBMITTED RUNS over the set of 22 stories (#rel: 2068). Two participating groups submitted runs for this task. The details about each group is shown in Table 2. The comprises manually annotated concepts in each story that first group (Group 1) employed a word embedding based can potentially be used as queries to retrieve a collection approach to expand the annotated concepts of each story of relevant images for each story. In fact, the retrieval re- to formulate a query and retrieve a ranked list of images. sults obtained with the manual annotations can act as strong Only the text of the image captions was used for computing baselines to compare against approaches that automatically similarities with the queries. The similarity function em- extract out the concepts from a story. The dataset con- ployed was tf-idf. The second group (Group 2) used Terrier tains the relevance assessments for each story obtained with for indexing the Image CLEF 2010 collection. For retrieval, pooling to a depth of 20. they applied the Divergence from Randomness (DFR) model Our initial experiments suggest that the dataset can be similarity function of Terrier. used to compare and evaluate various approaches to auto- Table 3 shows the official results evaluated on the submit- mated augmentation of documents with images. We demon- ted runs by the two participating groups. Each participat- strate that a tf-idf based term weighting for the query terms ing group were allowed to submit three runs. While group 1 can prove useful in improving retrieval effectiveness, thus submitted only one run, the second group submitted three. leaving open some of the future directions of research for It can be seen that the run submitted by Group 1 is com- effective query representation for this task. prised of a higher number of retrieved documents (6405) than the submitted runs of group 2 (about 100). Due to a higher value of the average number of retrieved images per References story by group 1 (6405/22 ≈ 291) in comparison to group 2 [1] B. Caputo, H. Müller, J. Martı́nez-Gómez, M. Ville- (100/22 ≈ 4.5), group 1 achieves a higher recall and MAP gas, B. Acar, N. Patricia, N. B. Marvasti, S. Üsküdarli, (compare the #relret and MAP values in Table 3). However, R. Paredes, M. Cazorla, I. Garcı́a-Varea, and V. Morell. the submitted runs from group 2 were scored high on pre- Imageclef 2014: Overview and analysis of the results. cision, e.g. compare the MRR and the P@5 values between In Information Access Evaluation. Multilinguality, Mul- the runs of the two groups. timodality, and Interaction - 5th International Confer- A comparison of the official results and our own baselines ence of the CLEF Initiative, CLEF 2014, Sheffield, (see Tables 3 and 1 shows that none of the submitted runs UK, September 15-18, 2014. Proceedings, pages 192–211, were able to outperform the simple baseline approaches that 2014. we had experimented with. More investigation is required to comment on this observation which we leave for future [2] Y. Feng and M. Lapata. Topic models for image annota- work. tion and text illustration. In Human Language Tech- nologies: The 2010 Annual Conference of the North 6. CONCLUSIONS AND FUTURE WORK American Chapter of the Association for Computational Linguistics, HLT ’10, pages 831–839, Stroudsburg, PA, In this paper, we describe the construction of a dataset for USA, 2010. Association for Computational Linguistics. the purpose of evaluating automated approaches for docu- ment augmentation with images. In particular, we address [3] M. M. Hall, P. D. Clough, O. L. de Lacalle, A. Soroa, the problem of automatically illustrating children stories. and E. Agirre. Enabling the discovery of digital cultural Our constructed dataset comprises of 22 children stories as heritage objects through wikipedia. In Proceedings of the set of queries and uses the ImageCLEF document col- the 6th Workshop on Language Technology for Cultural lection as the set of retrievable images. The dataset also Heritage, Social Sciences, and Humanities, LaTeCH ’12, 65 pages 94–100, Stroudsburg, PA, USA, 2012. Association for Computational Linguistics. [4] A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. CoRR, abs/1412.2306, 2014. [5] J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In SIGIR, pages 275– 281. ACM, 1998. [6] A. Popescu, T. Tsikrika, and J. Kludas. Overview of the wikipedia retrieval task at imageclef 2010. In M. Braschler, D. Harman, and E. Pianta, editors, CLEF (Notebook Papers/LABs/Workshops), 2010. [7] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. CoRR, abs/1411.4555, 2014. [8] K. Xu, J. Ba, R. Kiros, K. Cho, A. C. Courville, R. Salakhutdinov, R. S. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. CoRR, abs/1502.03044, 2015. 66