=Paper= {{Paper |id=Vol-1587/T3-1 |storemode=property |title=Overview of the Automated Story Illustration Task at FIRE 2015 |pdfUrl=https://ceur-ws.org/Vol-1587/T3-1.pdf |volume=Vol-1587 |authors=Debasis Ganguly,Lacer Calixto,Gareth Jones |dblpUrl=https://dblp.org/rec/conf/fire/GangulyCJ15 }} ==Overview of the Automated Story Illustration Task at FIRE 2015== https://ceur-ws.org/Vol-1587/T3-1.pdf
 Overview of the Automated Story Illustration Task at FIRE
                         2015

               Debasis Ganguly                          Iacer Calixto                           Gareth Jones
               ADAPT Centre                              ADAPT Centre                          ADAPT Centre
             School of Computing                     School of Computing                     School of Computing
             Dublin City University                  Dublin City University                 Dublin City University
                Dublin, Ireland                          Dublin, Ireland                       Dublin, Ireland
          dganguly@computing.dcu.ie               icalixto@computing.dcu.ie               gjones@computing.dcu.ie

ABSTRACT                                                              that “a picture is worth a thousand words”1 . The “document
In this paper, we describe an overview of the shared task             expansion with images” methodologies, developed and eval-
(track) carried out as part of the Forum of Information Re-           uated on this dataset, can also be applied to augment other
trieval and Evaluation (FIRE) 2015 workshop. The objec-               types of text documents, such as news articles, blogs etc.
tive in this task is to illustrate a passage of text automati-           The illustration of children’s stories is a particular in-
cally by retrieving a set of images and then inserting them at        stance of the general problem of automatic text illustration,
appropriate places in the text. In particular, for this track,        an inherently multimodal problem that involves image pro-
the text to be illustrated is a set of short stories (fables)         cessing and natural language processing. A related problem
for children. Some of the research challenges for the partic-         to automatic text illustration is that of automatic textual
ipants in developing an automated story illustrating system           generation of image description. This problem is in fact
involve developing techniques to automatically extract out            under active research and has drawn significant research in-
the concepts to be illustrated from a full story text, explore        terests in recent years [2, 7, 4, 8].
how to use these extracted concepts for query representa-                The rest of the paper is organized as follows. In Section
tion in order to retrieve a ranked list of images per query,          2, we present a brief overview of the task objectives. In Sec-
and finally investigating how merge the ranked lists obtained         tion 3, we describe how the dataset (queries and relevance
from each individual concept to present a single ranked list          judgments) is constructed. Section 4 describes our own ini-
of candidate relevant images per story. In addition to re-            tial experiments so as to obtain our own baselines on the
porting an overview of the approaches undertaken by two               dataset constructed. Section 5 provides a brief overview of
participating groups who submitted runs for this task, we             the approaches undertaken by the participating groups and
also report two of our own baseline approaches for tackling           presents the official results. Finally Section 6 concludes the
the problem of automated story illustration.                          paper with directions for future work.


                                                                      2.   TASK DESCRIPTION
                                                                         In order to share among researchers a dataset for text
1.   INTRODUCTION                                                     augmentation with images, and to encourage them to use
   Document expansion, in addition to inserting text and hy-          this dataset for research purposes, we are organizing a shared
perlinks, can also involve adding non textual content such            task, named “Automated Story Illustration”2 , as a part of
as images that are topically related to document text, in or-         the Forum of Information Retrieval and Evaluation (FIRE)
der to enhance the readability of the text. For example, in           2015 workshop3 . The goal of this task is to automatically
[3], Wikipedia articles are augmented with images retrieved           illustrate children’s short stories by retrieving a set of images
from the Kirklees image archive, where automatically ex-              that can be considered relevant to illustrate the concepts
tracted key concepts from the Wiki text passages were used            (agents, events and actions) of a given story.
to formulate the queries for retrieving the images. This au-             In contrast to the standard keyword-based ad-hoc search
tomatic augmentation of documents can be useful for vari-             for images [1], there exists no explicitly user formulated key-
ous purposes, such as enhancing the readability of text for           word based queries in this task. Instead, each text passage
children enabling them to learn and engage with the con-              acts as an implicit query for which images need to retrieved
tent more, for making it easier for medical students to learn         to augment it. To illustrate the task output with an exam-
more about a disease or its syndromes by looking at related           ple, let us consider the story “The Ant and the Grasshopper”
images, etc.                                                          shown in Figure 1. In the text we underline the key concepts
                                                                      that are likely to be used to formulate queries for illustrating
   The aim of our work, reported in this paper, is to build up        the story. Additionally, we show a set of manually collected
a dataset for evaluating the effectiveness of automated ap-
proaches for document expansion with images. In particular,           1
                                                                        http://en.wikipedia.org/wiki/A_picture_is_worth_
the problem that we address in the paper is that of augment-          a_thousand_words
ing the text of children’s short stories (e.g. fairy tales and        2
                                                                        http://srv-cngl.computing.dcu.ie/
fables) with images in order to help improve the readabil-            StoryIllustrationFireTask/
                                                                      3
ity of the stories for small children according to the adage            http://fire.irsi.res.in/fire/


                                                                 63
               IN a field one summer’s day a Grasshopper was hopping about, chirping and
           singing to its heart’s content. An Ant passed by, bearing along with great toil
           an ear of corn he was taking to the nest. “Why not come and chat with me, said
                                                                                                   the ImageCLEF collection.
           the Grasshopper, “instead of toiling and moiling in that way?” “I am helping               In order to facilitate participants to concentrate on re-
           to lay up food for the winter,” said the Ant, “and recommend you to do the
           same.” “Why bother about winter?” said the Grasshopper; “we have got plenty             trieval only, we manually annotated the short stories with
           of food at present.” But the Ant went on its way and continued its toil. When
           the winter came the Grasshopper had no food, and found itself dying of hunger,          concepts that are likely to require illustration. Participants,
           while it saw the ants distributing every day corn and grain from the stores they
           had collected in the summer. Then the Grasshopper knew: “IT IS BEST TO                  volunteering for the annotation task, were instructed to high-
           PREPARE FOR THE DAYS OF NECESSITY.”
                                                                                                   light parts of the stories that they feel would better be un-
                                                                                                   derstood by children with the help of illustrative images. In
                                                                                                   total, we got five participants annotating 22 stories, three
                                                                                                   annotating 4 each and the rest two annotating 5 each. Each
                                                                                                   story was annotated by a single participant only.
                                                                                                      For other participants who want to automatically extract
                                                                                                   the concepts from a story for the purpose of illustration, we
                                                                                                   encouraged them to develop automated approaches and then
                                                                                                   compare their results with the manually annotated ones. A
                                                                                                   participating system may use shallow natural language pro-
                                                                                                   cessing (NLP) techniques, such as named entity recognition
                                                                                                   and chunking, to first identify individual query concepts and
                                                                                                   then to retrieve candidate images for each of these. Another
                                                                                                   approach may be to use the entire text as query and then to
                                                                                                   cluster the result-list of documents to identify the individual
                                                                                                   query components.
                                                                                                      An important component in an information retrieval (IR)
Figure 1: The story of “The Ant and the Grasshopper” with                                          dataset is the set of relevance assessments for a query. To
a sample annotation of images from the web. Images were                                            obtain the set of relevant images for each story, we under-
manually retrieved with Google image search. The key terms                                         take the standard pooling procedure of IR, where a pool of
used as queries in Google image search are underlined in the                                       documents, i.e. the set of top ranked documents from re-
text.                                                                                              trieval systems with different settings, is assessed manually
                                                                                                   for relevance. The relevance judgements for our dataset are
                                                                                                   obtained as follows.
images from the results of Google image search4 executed                                              Firstly, in order to be able to search for images with ad-
with each of these underlined phrases as queries. It can be                                        hoc keywords, we indexed the ImageCLEF collection. In
seen that the story with these sample images is likely to                                          particular, the extracted text from the caption of each image
be more appealing to a child rather than the plain raw text.                                       in the ImageCLEF collection, was indexed as a retrievable
This is because, with the accompanying images, the children                                        document. The ImageCLEF collection was indexed with
can potentially relate to the concepts described in the text,                                      Lucene6 , an open source IR system in Java.
e.g. the top left image shows a child how does a “summer                                              Secondly, we make use of the manually annotated concepts
day’s field” look like.                                                                            as an individual query that is executed on the document col-
                                                                                                   lection of the ImageCLEF. To construct the pool, we obtain
                                                                                                   runs with different retrieval models, such as BM25, LM and
3.     DATASET DESCRIPTION                                                                         tf-idf with default parameter settings in Lucene and finally
   It is worth mentioning that we use Google image search in                                       fuse the ranked lists with the standard COMBSUM merging
our example of Figure 1 for illustrative purpose only. How-                                        technique.
ever, in order to achieve a fair comparison between auto-                                             Finally, top 20 documents from this fused ranked list were
mated approaches to the story illustration task, it is imper-                                      then assessed for relevance. The relevance assessment for
ative to build up a dataset comprised of a static document                                         each manually annotated concept for each story was con-
collection, a set of test queries (text from stories), and the                                     ducted by the same participant who created the annotation
relevance assessments for each story.                                                              in the first place. This ensured that the participants had
   The static image collection that we use for this task is the                                    a clear understanding of the relevance criteria. The partic-
ImageCLEF 2010 Wikipedia image collection released [6].                                            ipants were asked to assign relevance on a five point scale
For the queries, we used popular children’s fairy tales since                                      ranging from absolutely non-relevant to highly relevant.
most of them are available in the public domain and freely
distributable. In particular, we make use of 22 short stories
collected from “Aesop’s Fables”5 .                                                                 4.     OUR BASELINES
   The first research challenge for an automated story illus-                                         In this section, we describe some initial experiments that
tration approach is to extract the key concepts from the text                                      we conducted on our dataset, meant to act as baselines for
passages in order to formulate suitable queries for retrieving                                     future work on this dataset. As our first baseline we simply
relevant images, e.g. an automated approach should extract                                         use all the words in a story to create a query. We then use
“summer day field” as a meaningful unit for illustration. The                                      this query to retrieve a list of images by making use of the
second research challenge is to make use of these extracted                                        similarity of the query with the caption texts of the images
concepts or phrases to construct queries and perform re-                                           in the index. The retrieval model that we use is the LM
trieval from the collection of images, which in this cases is                                      with Jelinek Mercer smoothing [5]. As a second baseline, we
4
                                                                                                   still use all the words in the story but this time weight each
    https://images.google.com/
5                                                                                                  6
    https://en.wikipedia.org/wiki/Aesop                                                                https://lucene.apache.org/


                                                                                              64
query term by its tf-idf score. It is worth mentioning here            Grp Affiliation                                   #members
that the two baselines that we use are quite simple because
                                                                        1   Amrita Vishwa Vidyapeetham, Coimbat-              3
our intention is to see how simple methods can perform,
                                                                            ore, India
before attempting to apply more involved approaches for
                                                                        2   i) Charotar University of Science and             4
this task.
                                                                            Technology, Anand, India; ii) L.D.R.P.
                                                                            College, Gandhinagar, India; iii) Gujarat
     Approach                     MAP        P@5      P@10
                                                                            University, Ahmedabad, India.
     Unweighted qry terms         0.0275    0.1048    0.0905
     tf-idf weighted qry terms   0.0529    0.1714    0.1238
                                                                       Table 2: Participating groups for FIRE Automated Story
                                                                       Illustration task 2015.
Table 1: Retrieval effectiveness of simple baseline ap-
proaches averaged over 22 stories.
                                                                       Grp Run             Evaluation Metrics
                                                                        Id  Id #ret #relret MAP      MRR B-pref               P@5
  In Table 1, we observe that simply using all terms of a
story as a query to retrieve a ranked list of images does not           1     1   6405     255    0.0107 0.1245 0.1241 0.0636
produce satisfactory results. In contrast, even a very simple           2     1    92       16    0.0047 0.3708 0.0074 0.1273
approach of weighting the terms in the text of the story by             2     2    95       20    0.0053 0.2997 0.0095 0.1545
their tf-idf weights can produce a significant improvement              2     3   100       13    0.0030 0.2504 0.0065 0.0909
in the results. We believe that shallow NLP techniques to
extract useful concepts can further improve the results.               Table 3: Official results of the FIRE Automated Story Il-
                                                                       lustration task 2015. The evaluation measures are averaged
5.    SUBMITTED RUNS                                                   over the set of 22 stories (#rel: 2068).
   Two participating groups submitted runs for this task.
The details about each group is shown in Table 2. The
                                                                       comprises manually annotated concepts in each story that
first group (Group 1) employed a word embedding based
                                                                       can potentially be used as queries to retrieve a collection
approach to expand the annotated concepts of each story
                                                                       of relevant images for each story. In fact, the retrieval re-
to formulate a query and retrieve a ranked list of images.
                                                                       sults obtained with the manual annotations can act as strong
Only the text of the image captions was used for computing
                                                                       baselines to compare against approaches that automatically
similarities with the queries. The similarity function em-
                                                                       extract out the concepts from a story. The dataset con-
ployed was tf-idf. The second group (Group 2) used Terrier
                                                                       tains the relevance assessments for each story obtained with
for indexing the Image CLEF 2010 collection. For retrieval,
                                                                       pooling to a depth of 20.
they applied the Divergence from Randomness (DFR) model
                                                                          Our initial experiments suggest that the dataset can be
similarity function of Terrier.
                                                                       used to compare and evaluate various approaches to auto-
   Table 3 shows the official results evaluated on the submit-
                                                                       mated augmentation of documents with images. We demon-
ted runs by the two participating groups. Each participat-
                                                                       strate that a tf-idf based term weighting for the query terms
ing group were allowed to submit three runs. While group 1
                                                                       can prove useful in improving retrieval effectiveness, thus
submitted only one run, the second group submitted three.
                                                                       leaving open some of the future directions of research for
It can be seen that the run submitted by Group 1 is com-
                                                                       effective query representation for this task.
prised of a higher number of retrieved documents (6405)
than the submitted runs of group 2 (about 100). Due to a
higher value of the average number of retrieved images per             References
story by group 1 (6405/22 ≈ 291) in comparison to group 2              [1] B. Caputo, H. Müller, J. Martı́nez-Gómez, M. Ville-
(100/22 ≈ 4.5), group 1 achieves a higher recall and MAP                   gas, B. Acar, N. Patricia, N. B. Marvasti, S. Üsküdarli,
(compare the #relret and MAP values in Table 3). However,                  R. Paredes, M. Cazorla, I. Garcı́a-Varea, and V. Morell.
the submitted runs from group 2 were scored high on pre-                   Imageclef 2014: Overview and analysis of the results.
cision, e.g. compare the MRR and the P@5 values between                    In Information Access Evaluation. Multilinguality, Mul-
the runs of the two groups.                                                timodality, and Interaction - 5th International Confer-
   A comparison of the official results and our own baselines              ence of the CLEF Initiative, CLEF 2014, Sheffield,
(see Tables 3 and 1 shows that none of the submitted runs                  UK, September 15-18, 2014. Proceedings, pages 192–211,
were able to outperform the simple baseline approaches that                2014.
we had experimented with. More investigation is required
to comment on this observation which we leave for future               [2] Y. Feng and M. Lapata. Topic models for image annota-
work.                                                                      tion and text illustration. In Human Language Tech-
                                                                           nologies: The 2010 Annual Conference of the North
6.    CONCLUSIONS AND FUTURE WORK                                          American Chapter of the Association for Computational
                                                                           Linguistics, HLT ’10, pages 831–839, Stroudsburg, PA,
   In this paper, we describe the construction of a dataset for            USA, 2010. Association for Computational Linguistics.
the purpose of evaluating automated approaches for docu-
ment augmentation with images. In particular, we address               [3] M. M. Hall, P. D. Clough, O. L. de Lacalle, A. Soroa,
the problem of automatically illustrating children stories.                and E. Agirre. Enabling the discovery of digital cultural
Our constructed dataset comprises of 22 children stories as                heritage objects through wikipedia. In Proceedings of
the set of queries and uses the ImageCLEF document col-                    the 6th Workshop on Language Technology for Cultural
lection as the set of retrievable images. The dataset also                 Heritage, Social Sciences, and Humanities, LaTeCH ’12,


                                                                  65
   pages 94–100, Stroudsburg, PA, USA, 2012. Association
   for Computational Linguistics.
[4] A. Karpathy and L. Fei-Fei. Deep visual-semantic
    alignments for generating image descriptions. CoRR,
    abs/1412.2306, 2014.
[5] J. M. Ponte and W. B. Croft. A language modeling
    approach to information retrieval. In SIGIR, pages 275–
    281. ACM, 1998.
[6] A. Popescu, T. Tsikrika, and J. Kludas. Overview
    of the wikipedia retrieval task at imageclef 2010. In
    M. Braschler, D. Harman, and E. Pianta, editors, CLEF
    (Notebook Papers/LABs/Workshops), 2010.
[7] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show
    and tell: A neural image caption generator. CoRR,
    abs/1411.4555, 2014.
[8] K. Xu, J. Ba, R. Kiros, K. Cho, A. C. Courville,
    R. Salakhutdinov, R. S. Zemel, and Y. Bengio. Show,
    attend and tell: Neural image caption generation with
    visual attention. CoRR, abs/1502.03044, 2015.




                                                              66