=Paper=
{{Paper
|id=Vol-3144/RP-paper12
|storemode=property
|title=StoryBook - A Storytelling-based Platform for Digital Book Stores
|pdfUrl=https://ceur-ws.org/Vol-3144/RP-paper12.pdf
|volume=Vol-3144
|authors=Eleonora Bernasconi,Miguel Ceriani,Francesca De Luzi,Clara Di Fazio,Roberto Marcucci,Massimo Mecella,Fabio Petriccione,Francesco Sapio
|dblpUrl=https://dblp.org/rec/conf/rcis/BernasconiCLFMM22
}}
==StoryBook - A Storytelling-based Platform for Digital Book Stores==
<pdf width="1500px">https://ceur-ws.org/Vol-3144/RP-paper12.pdf</pdf>
<pre>
StoryBook - A Storytelling-based Platform for Digital
Book Stores
Eleonora Bernasconi1 , Miguel Ceriani2 , Francesca De Luzi1 , Clara Di Fazio3 ,
Roberto Marcucci4 , Massimo Mecella1 , Fabio Petriccione3 and Francesco Sapio1
1
  Sapienza Università di Roma, Dipartimento di Ingegneria informatica, automatica e gestionale Antonio Ruberti, via
Ariosto, 25, 00185 Rome, Italy
2
  Università degli Studi di Bari Aldo Moro, Dipartimento di Informatica, via Edoardo Orabona, 4, 70125 Bari, Italy
3
  ATON IT - Innovation Technology, via Fiume Giallo, 3, 00144 Rome, Italy
4
  L’Erma di Bretschneider, via Marianna Dionigi, 57, 00193 Rome, Italy


                                      Abstract
                                      The StoryBook project stems from the cooperation between Sapienza Università di Roma, the company
                                      Aton IT and the historic publishing house L’Erma di Bretschneider. The project deals with the broad
                                      area of Digital Humanities and has favored the development of a prototype system for semi-automatic
                                      video trailer generation of books.

                                      Keywords
                                      storytelling, knowledge extraction, digital library


1. Introduction
StoryBook is a research project in the field of digital humanities, which proposes the knowledge
extraction and management of information of a digital library to crate semi-automatically video
trailers of books. StoryBook has been originally conceived to meet the needs of “L’Erma di
Bretschneider” publishing house that deals with topics related to ancient history and archaeology.
From publishers’ point of view, promotional trailers respond to a changing market with a high
focus on digital and visual media. The goal of a digital presentation of a book is nevertheless
broader than selling it and includes providing helpful information to the potential future reader.
A publishing house wants to disclose the contents of its digital library not only to experts in the
sector but also to interested people attracted by the contents of their books shown on the Web in
the form of searching tools or advertising such as video trailers. Numerous researches show that
a book trailer fosters the desire to learn and the level of motivation to read [4, 5, 6, 7]. StoryBook
is a software tool to support the creation of book trailers by collecting and organizing relevant
video content. The system users retain control on how to edit and compose the content. The
Joint Proceedings of RCIS 2022 Workshops and Research Projects Track, May 17-20, 2022, Barcelona, Spain
Envelope-Open bernasconi@diag.uniroma1.it (E. Bernasconi); miguel.ceriani@uniba.it (M. Ceriani); deluzi@diag.uniroma1.it
(F. De Luzi); clara.difazio@atoninformatica.it (C. Di Fazio); roberto.marcucci@lerma.it (R. Marcucci);
mecella@diag.uniroma1.it (M. Mecella); fabio.petriccione@atoninformatica.it (F. Petriccione);
sapio@diag.uniroma1.it (F. Sapio)
Orcid 0000-0003-3142-3084 (E. Bernasconi); 0000-0002-5074-2112 (M. Ceriani); 0000-0002-9896-2528 (F. De Luzi);
0000-0002-9730-8882 (M. Mecella); 0000-0001-9544-2108 (F. Sapio)
                                    © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
 CEUR
 Workshop
 Proceedings
               http://ceur-ws.org
               ISSN 1613-0073
                                    CEUR Workshop Proceedings (CEUR-WS.org)
proposed technique aims at semi-automatically building digital trailers that allow the viewers,
generically interested in a specialized topic but not expert, to appreciate better the topic, both
for their own cultural/professional enrichment and a possible purchase.


2. Summary of project results
With the StoryBook project, we aimed at demonstrating that querying a knowledge graph about
the content of a digital library can semi-automatically generate a book’s trailer based on this
information, and can favour the diffusion of the contents of a digital library and enhance the
cultural heritage contained in those digital libraries.
   StoryBook potential is based on technologies such as:
    • Linked Data to allow disambiguation of concepts and the connection of the information
      with the Web;
    • Named entity recognition (NER) to identify people, cities, organizations and things in
      unstructured text like that of books;
    • Knowledge graphs to organize information around concepts with their relations;
    • Computer Vision to detect objects in the images of books allowing their searchability.
   In Figure 1, you can see the whole process that leads to the semi-automatic book trailer
creation. The process starts from point 1 with a digital library (a collection of books in PDF
format). Through the Arca platform [2, 3, 1] (which was developed by the same authors in
previous projects), we extracted the concepts and the most relevant concepts of a book with
Natural Language Processing (NLP) techniques, we linked these concepts with the DBpedia
Knowledge Graph1 , and we sent all this information along with the relevant metadata to a
linked data container (point 3).

With the StoryBook project, we started extracting images from books (point 2.b.1). The process
is designed to work on books with meaningful visual content in images extracted from the PDF.
We selected the “k” best images of the book. After trying different methods, the easiest one that
yielded good results was sorting the images by their dimensions to detect the best images. We
observed that book publishers tend to let essential images take up more space on the page (thus
being more significant in size once extracted).

In point 2.b.2, there is object detection. In order to speed up the development, instead of
implementing a custom image classification, we went for using an external service, Google
Vision AI2 . We retrieve up to “o” labels for each image, along with their confidence level. The
concepts and top “l” concepts associated with a book and its metadata coming from point 2.a,
and the “o” labels, point 2.b.2 associated with images of the book point 2.b.1 were sent as linked
data to the linked data container (point 3).

At point 4, now we can query different information from a book:
   1
       https://dbpedia.org/
   2
       https://cloud.google.com/
Figure 1: Storybook pipeline


    • all the concepts;
    • the most relevant concepts;
    • the metadata (like title, author, topics, etc.);
    • the objects contained in the book’s images.
    We introduced some parameters to allow users who are curators of a publishing house to
filter the video search. The parameters are:
    • the video duration;
    • the type of video licence;
    • a white list containing words relevant for curators;
    • a black list containing the words that mustn’t be in the search;
    • the book’s metadata.
   We use this information to Web crawl looking for relevant videos. After experimenting with
different sources of video content, we decided to focus on a single source, YouTube3 . YouTube is
currently the most extensive database of videos in the world. We can start the query to YouTube
   3
       https://www.youtube.com/
filtered with all the elicited parameters.

The next step (point 5) of the process consists of filtering the “k” videos resulting from the
previous step, and organising them in a draft of the final trailer. The algorithm assigns each
video a multidimensional score (one dimension for each image). The score increases when there
is a match between the image’s labels and the video’s metadata, such as the description. The
confidence of the image-keywords association gives a score’s weight (as assigned during the
classification step). Once each video has a score, the algorithm matches the highest score per
single image, associates that video with the specific image, and discards all the others. After-
wards, the trailer is generated by interleaving the extracted images, and their correspondent
retrieved videos.

This trailer draft is generated (point 6) for compatibility reasons as an annotated PowerPoint
presentation, thus allowing the curator to manipulate the content as s/he think fitting before
the actual video creation. Furthermore, all the data generated along the pipeline is packaged
inside a JSON file that the curator can access to check the intermediates results and other details
of the process.

  We carried on a preliminary evaluation with a domain expert. Satisfactory results were
obtained by choosing:

    • “l” number of top concepts = “k” number of books’ images = “o” number of objects detected
      in one image = 10
    • “m” number of resulted video’s query = 12

  Furthermore, the domain expert gave some general feedback on the system paradigm. She
identified several perceived strengths and potentialities of the system:

    • the transformation of the images of books into information nodes;
    • the control of the automatic creation process of a book trailer, through accessible configu-
      ration parameters;
    • the free access and management of the information output generated by StoryBook.

  She also expressed some concerns for the perceived weaknesses:

    • the scarce availability of videos of niche topics on the Web;
    • the prolonged extraction process duration for searches involving longer crawler queries.


3. Concluding remarks
With StoryBook, a tool for the semi-automatic generation of book trailers, we aimed at intro-
ducing an innovation in the scientific publishing market for the historical, historical-artistic and
archaeological disciplines, a specific context characterized by users such as students, researchers,
scholars, academics, cultural bodies and institutions, universities operating in the sector of
Cultural Heritage and antiquity sciences. By revolutionizing the traditional approach to access
and use of texts and images of a digital library, this is an operation that hopefully will have a
strong impact in the reference context. For the future, we have planned a formal user study to
further evaluate the system.


Acknowledgments
This work has been partly supported by the project STORYBOOK (POR FESR Lazio 2014-2020 -
Avviso Pubblico “Progetti di Innovazione Digitale”, domanda prot. n. A0349-2020-34437).


References
[1] Bernasconi, E., Ceriani, M., Mecella, M., Catarci, T., Capanna, C., Di Fazio C., Marcucci, R.,
    Pender, E., Petriccione, F.: ARCA. Semantic exploration of a bookstore. (AVI ’20). Association
    for Computing Machinery, New York, NY, USA, Article 78, pp. 1–3. (2020).
[2] Ceriani, M., Bernasconi, E., Mecella, M.: A Streamlined Pipeline to Enable the Semantic
    Exploration of a Bookstore. (IRCDL 2020). Springer International Publishing, Cham, pp.
    75–81. (2020).
[3] Bernasconi, E., Ceriani, M., Mecella, M.: Exploring a Text Corpus via a Knowledge Graph.
    (IRCDL 2021). CEUR Workshop Proceedings, pp. 91–102. (2021).
[4] Nikonova, Nadezhda Ilinichna, Zalutskaya, Svetlana Yrievna (2021). Modern communication
    technologies in education: book trailer. Revista Tempos e Espaços em Educação, 14(33),
    ISSN 2358-1425, Revista Tempos e Espacos em Educacao.
[5] Chepukov, K.Yu. (2021). Expressive means of painting as a means of self-expression of
    younger schoolchildren. Искусство и образование(6), 177-182, ISSN 2072-0432, Interna-
    tional Centre Art and Education.
[6] Jiménez-Marín, Gloria, Zambrano, Rodrigo Elías (2020). The Book Trailer as a Publishing
    House Promotional Tool. Advances in Business Strategy and Competitive Advantage, 147-
    160, ISSN 2327-3429, IGI Global.
[7] D. Luchev, D. Paneva-Marinova, M. Dimova (2019) Digital Storytelling and digital book
    trailer applications for educational purposes in Bulgaria, INTED2019 Proceedings, pp.
    529-534.

</pre>