=Paper=
{{Paper
|id=Vol-3144/RP-paper12
|storemode=property
|title=StoryBook - A Storytelling-based Platform for Digital Book Stores
|pdfUrl=https://ceur-ws.org/Vol-3144/RP-paper12.pdf
|volume=Vol-3144
|authors=Eleonora Bernasconi,Miguel Ceriani,Francesca De Luzi,Clara Di Fazio,Roberto Marcucci,Massimo Mecella,Fabio Petriccione,Francesco Sapio
|dblpUrl=https://dblp.org/rec/conf/rcis/BernasconiCLFMM22
}}
==StoryBook - A Storytelling-based Platform for Digital Book Stores==
StoryBook - A Storytelling-based Platform for Digital Book Stores Eleonora Bernasconi1 , Miguel Ceriani2 , Francesca De Luzi1 , Clara Di Fazio3 , Roberto Marcucci4 , Massimo Mecella1 , Fabio Petriccione3 and Francesco Sapio1 1 Sapienza Università di Roma, Dipartimento di Ingegneria informatica, automatica e gestionale Antonio Ruberti, via Ariosto, 25, 00185 Rome, Italy 2 Università degli Studi di Bari Aldo Moro, Dipartimento di Informatica, via Edoardo Orabona, 4, 70125 Bari, Italy 3 ATON IT - Innovation Technology, via Fiume Giallo, 3, 00144 Rome, Italy 4 L’Erma di Bretschneider, via Marianna Dionigi, 57, 00193 Rome, Italy Abstract The StoryBook project stems from the cooperation between Sapienza Università di Roma, the company Aton IT and the historic publishing house L’Erma di Bretschneider. The project deals with the broad area of Digital Humanities and has favored the development of a prototype system for semi-automatic video trailer generation of books. Keywords storytelling, knowledge extraction, digital library 1. Introduction StoryBook is a research project in the field of digital humanities, which proposes the knowledge extraction and management of information of a digital library to crate semi-automatically video trailers of books. StoryBook has been originally conceived to meet the needs of “L’Erma di Bretschneider” publishing house that deals with topics related to ancient history and archaeology. From publishers’ point of view, promotional trailers respond to a changing market with a high focus on digital and visual media. The goal of a digital presentation of a book is nevertheless broader than selling it and includes providing helpful information to the potential future reader. A publishing house wants to disclose the contents of its digital library not only to experts in the sector but also to interested people attracted by the contents of their books shown on the Web in the form of searching tools or advertising such as video trailers. Numerous researches show that a book trailer fosters the desire to learn and the level of motivation to read [4, 5, 6, 7]. StoryBook is a software tool to support the creation of book trailers by collecting and organizing relevant video content. The system users retain control on how to edit and compose the content. The Joint Proceedings of RCIS 2022 Workshops and Research Projects Track, May 17-20, 2022, Barcelona, Spain Envelope-Open bernasconi@diag.uniroma1.it (E. Bernasconi); miguel.ceriani@uniba.it (M. Ceriani); deluzi@diag.uniroma1.it (F. De Luzi); clara.difazio@atoninformatica.it (C. Di Fazio); roberto.marcucci@lerma.it (R. Marcucci); mecella@diag.uniroma1.it (M. Mecella); fabio.petriccione@atoninformatica.it (F. Petriccione); sapio@diag.uniroma1.it (F. Sapio) Orcid 0000-0003-3142-3084 (E. Bernasconi); 0000-0002-5074-2112 (M. Ceriani); 0000-0002-9896-2528 (F. De Luzi); 0000-0002-9730-8882 (M. Mecella); 0000-0001-9544-2108 (F. Sapio) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) proposed technique aims at semi-automatically building digital trailers that allow the viewers, generically interested in a specialized topic but not expert, to appreciate better the topic, both for their own cultural/professional enrichment and a possible purchase. 2. Summary of project results With the StoryBook project, we aimed at demonstrating that querying a knowledge graph about the content of a digital library can semi-automatically generate a book’s trailer based on this information, and can favour the diffusion of the contents of a digital library and enhance the cultural heritage contained in those digital libraries. StoryBook potential is based on technologies such as: • Linked Data to allow disambiguation of concepts and the connection of the information with the Web; • Named entity recognition (NER) to identify people, cities, organizations and things in unstructured text like that of books; • Knowledge graphs to organize information around concepts with their relations; • Computer Vision to detect objects in the images of books allowing their searchability. In Figure 1, you can see the whole process that leads to the semi-automatic book trailer creation. The process starts from point 1 with a digital library (a collection of books in PDF format). Through the Arca platform [2, 3, 1] (which was developed by the same authors in previous projects), we extracted the concepts and the most relevant concepts of a book with Natural Language Processing (NLP) techniques, we linked these concepts with the DBpedia Knowledge Graph1 , and we sent all this information along with the relevant metadata to a linked data container (point 3). With the StoryBook project, we started extracting images from books (point 2.b.1). The process is designed to work on books with meaningful visual content in images extracted from the PDF. We selected the “k” best images of the book. After trying different methods, the easiest one that yielded good results was sorting the images by their dimensions to detect the best images. We observed that book publishers tend to let essential images take up more space on the page (thus being more significant in size once extracted). In point 2.b.2, there is object detection. In order to speed up the development, instead of implementing a custom image classification, we went for using an external service, Google Vision AI2 . We retrieve up to “o” labels for each image, along with their confidence level. The concepts and top “l” concepts associated with a book and its metadata coming from point 2.a, and the “o” labels, point 2.b.2 associated with images of the book point 2.b.1 were sent as linked data to the linked data container (point 3). At point 4, now we can query different information from a book: 1 https://dbpedia.org/ 2 https://cloud.google.com/ Figure 1: Storybook pipeline • all the concepts; • the most relevant concepts; • the metadata (like title, author, topics, etc.); • the objects contained in the book’s images. We introduced some parameters to allow users who are curators of a publishing house to filter the video search. The parameters are: • the video duration; • the type of video licence; • a white list containing words relevant for curators; • a black list containing the words that mustn’t be in the search; • the book’s metadata. We use this information to Web crawl looking for relevant videos. After experimenting with different sources of video content, we decided to focus on a single source, YouTube3 . YouTube is currently the most extensive database of videos in the world. We can start the query to YouTube 3 https://www.youtube.com/ filtered with all the elicited parameters. The next step (point 5) of the process consists of filtering the “k” videos resulting from the previous step, and organising them in a draft of the final trailer. The algorithm assigns each video a multidimensional score (one dimension for each image). The score increases when there is a match between the image’s labels and the video’s metadata, such as the description. The confidence of the image-keywords association gives a score’s weight (as assigned during the classification step). Once each video has a score, the algorithm matches the highest score per single image, associates that video with the specific image, and discards all the others. After- wards, the trailer is generated by interleaving the extracted images, and their correspondent retrieved videos. This trailer draft is generated (point 6) for compatibility reasons as an annotated PowerPoint presentation, thus allowing the curator to manipulate the content as s/he think fitting before the actual video creation. Furthermore, all the data generated along the pipeline is packaged inside a JSON file that the curator can access to check the intermediates results and other details of the process. We carried on a preliminary evaluation with a domain expert. Satisfactory results were obtained by choosing: • “l” number of top concepts = “k” number of books’ images = “o” number of objects detected in one image = 10 • “m” number of resulted video’s query = 12 Furthermore, the domain expert gave some general feedback on the system paradigm. She identified several perceived strengths and potentialities of the system: • the transformation of the images of books into information nodes; • the control of the automatic creation process of a book trailer, through accessible configu- ration parameters; • the free access and management of the information output generated by StoryBook. She also expressed some concerns for the perceived weaknesses: • the scarce availability of videos of niche topics on the Web; • the prolonged extraction process duration for searches involving longer crawler queries. 3. Concluding remarks With StoryBook, a tool for the semi-automatic generation of book trailers, we aimed at intro- ducing an innovation in the scientific publishing market for the historical, historical-artistic and archaeological disciplines, a specific context characterized by users such as students, researchers, scholars, academics, cultural bodies and institutions, universities operating in the sector of Cultural Heritage and antiquity sciences. By revolutionizing the traditional approach to access and use of texts and images of a digital library, this is an operation that hopefully will have a strong impact in the reference context. For the future, we have planned a formal user study to further evaluate the system. Acknowledgments This work has been partly supported by the project STORYBOOK (POR FESR Lazio 2014-2020 - Avviso Pubblico “Progetti di Innovazione Digitale”, domanda prot. n. A0349-2020-34437). References [1] Bernasconi, E., Ceriani, M., Mecella, M., Catarci, T., Capanna, C., Di Fazio C., Marcucci, R., Pender, E., Petriccione, F.: ARCA. Semantic exploration of a bookstore. (AVI ’20). Association for Computing Machinery, New York, NY, USA, Article 78, pp. 1–3. (2020). [2] Ceriani, M., Bernasconi, E., Mecella, M.: A Streamlined Pipeline to Enable the Semantic Exploration of a Bookstore. (IRCDL 2020). Springer International Publishing, Cham, pp. 75–81. (2020). [3] Bernasconi, E., Ceriani, M., Mecella, M.: Exploring a Text Corpus via a Knowledge Graph. (IRCDL 2021). CEUR Workshop Proceedings, pp. 91–102. (2021). [4] Nikonova, Nadezhda Ilinichna, Zalutskaya, Svetlana Yrievna (2021). Modern communication technologies in education: book trailer. Revista Tempos e Espaços em Educação, 14(33), ISSN 2358-1425, Revista Tempos e Espacos em Educacao. [5] Chepukov, K.Yu. (2021). Expressive means of painting as a means of self-expression of younger schoolchildren. Искусство и образование(6), 177-182, ISSN 2072-0432, Interna- tional Centre Art and Education. [6] Jiménez-Marín, Gloria, Zambrano, Rodrigo Elías (2020). The Book Trailer as a Publishing House Promotional Tool. Advances in Business Strategy and Competitive Advantage, 147- 160, ISSN 2327-3429, IGI Global. [7] D. Luchev, D. Paneva-Marinova, M. Dimova (2019) Digital Storytelling and digital book trailer applications for educational purposes in Bulgaria, INTED2019 Proceedings, pp. 529-534.