Identifying Transmedia Works from User-Generated Knowledge Bases : Japanese Pop Culture Study Case Stella Zevio1 , Tetsuya Mihara2 , and Shigeo Sugimoto2 1 LIPN - CNRS UMR 7030, Université Paris 13, France zevio@lipn.univ-paris13.fr 2 Faculty of Library, Information and Media Science, University of Tsukuba, Japan mihara@slis.tsukuba.ac.jp sugimoto@slis.tsukuba.ac.jp Abstract. As Japanese pop culture spreads worldwide, digital libraries compiling information about works through representative media (manga, anime, video games) emerge. Some of these works may share the same story, characters or universe, thus being part of a conceptual instance which we call a transmedia work in this paper. Transmedia works are ab- stract entities composed of works through several media linked together by semantic relationships. Identifying works belonging to the same trans- media work is still a challenge to enhance access, retrieval and organi- zation of media in digital libraries. To overcome this challenge, semantic relationships between works should be identified. As no authority data yet describes semantic relationships between works, we need to find this information in knowledge bases generated by users such as Wikipedia. More precisely, we exploit DBpedia, Wikipedia’s Linked Data counter- part, to respect the semantic web standards. In this paper, we present our method and experiment in building work entity datasets of Japanese pop culture (manga, anime and video games) and extracting relation- ships between these works in order to ease identification of transmedia works from the semantic data structure used in DBpedia. We also ex- tract pertinent information to link works to bibliographic data in the future. We propose an evaluation of our contribution and demonstrate that we can easily and relevantly identify works belonging to the same transmedia work from user-generated knowledge bases. Keywords: Transmedia work · Semantic Web · Digital Libraries · Linked data · Domain-dependent semantic data analysis 1 Introduction Several works through different media can sometimes express the same story, take place in the same universe or exploit the same characters. In this case, these works are part of the same transmedia work [17]. 2 S. Zevio et al. Example 1. ”Dragon Ball Z: Budokai” video game and ”Dragon Ball” anime take place in the same universe and share some common characters, thus they are both part of the same transmedia work. Identifying transmedia works would be useful for a better access, retrieval and organization of media, in particular for digital libraries. Indeed, identifying semantically linked works is still a challenge and a key issue for recommandation within digital libraries supporting various media formats. As Japanese pop culture has long been considered as a subculture unworthy of interest, there is no sufficient authorized data of representative media nor reli- able knowledge bases describing relations between works. Still, there are emerg- ing digital libraries and databases of Japanese pop culture media, as interest in Japanese pop culture grows worldwide. Media Art Database[6] (MADB) is a database of manga, animation and video games published in Japan, produced by Agency for Cultural Affairs in Japan as the national authority of works through these media. However, MADB compiles information about works through dif- ferent media but lacks information about relationships between these works. On the other hand, the information is available from knowledge bases generated by users, such as Wikipedia[7]. In this research, our aim is to identify transmedia works of Japanese pop cul- ture, as no authority data describes them. To achieve this goal, we first extract work entity datasets of manga, anime and video games from user-based knowl- edge bases, then exploit semantic relationships described by users between these works to find works belonging to the same transmedia work. We use DBpedia[1], which is Wikipedia’s Linked Open Data dataset, in order to extract relations be- tween works according to the semantic web standards, in an interpretable and interoperable way. Using DBpedia enables us to take advantage of the simplicity, interoperability and interpretability brought by semantic web technologies. We choose to exploit English resources as they are known to be richer than Japanese ones. This method is of course heavily dependent on the data structure used in DBpedia thus we’re discussing this issue in section 4. Our contribution lies at the interface between domain-dependent semantic data analysis and knowledge extraction from Linked Data. This paper presents our method and experiment in building work entity datasets of manga, anime and video games and extracting semantic relation- ships between them in order to identify works belonging to the same transmedia work. In section 2, we present related work. In section 3 we present our experi- ment, results we obtained as well as an evaluation. In section 4 we present our conclusions and we discuss about further work. 2 Related Work Bibliographic information describing semantic relationships between works is useful when it comes to transmedia publications like adaptations for example. The Functional Requirements for Bibliographic Records (FRBR)[13] model, de- veloped by the International Federation of Library Associations and Institutions Semantic Relationships from User-Generated Knowledge Bases 3 (IFLA)[16], defines entities and their relationships for advanced functions of bib- liographic records. In the FRBR model, work entity is defined as an abstract one to express distinct intellectual or artistic creation. Different editions or transla- tions of the same creation are semantically connected to each other. In addition, works belonging to the same creation group also have semantic relationships between each other, for example, William Shakespeare’s Romeo and Juliet and its namesake film adaptation from Franco Zeffirelli. The FRBR model is commonly used as a conceptual data model for biblio- graphic records [16], cataloging rules (Resource Description and Access (RDA)[8] being the most representative one) and even pop-culture databases. McDonough[12] evaluates the usability of the FRBR model to describe relationships between var- ious editions, translations, and adaptations of video games. Jett[10] developed a conceptual model reflecting FRBR for video games and interactive media. On the other hand, if the FRBR model intends describing relationships be- tween entities, there is actually a lack of datasets or records describing such entity relationships, especially for Japanese pop-culture[11]. OCLC WorldCat Fiction Finder[3] provides data about relationships between different editions of the same work. Unfortunately, records for animation and video game are not well covered by this database. A method for creating FRBR dataset from existing datasets and conventional bibliographic records is FRBRization[15]. For example, WorldCat Fiction Finder is populated from MARC[5] bibliographic and authority records by using OCLC FRBR Work-Set Algorithm[4]. He et al.[9] proposed a method for identifying FRBR Works using Wikipedia, through DBpedia articles for manga. DBpedia is used as a reference authority in order to identify Work level entities of manga in the catalog records of Kyoto Manga Museum which is the largest library for manga in Japan. Takhirov[14] proposed a method for linking a FRBR entity to its corresponding LOD entity and an evaluation using DBpedia and Amazon bookstore’s Web API. Although He and Takhirov focus on the information about books and do not show interest about transmedia works, they suggest that using DBpedia as a source of work entities and their relationships is a viable solution. As DBpedia has many resources about transmedia works including manga, anime and video games, our contribution aims at measuring the quantity and quality of transmedia works and semantic relationships between them that we can extract from DBpedia with simple SPARQL queries. 3 Experiment 3.1 Overview In order to identify transmedia works, we conduct an experiment consisting in two steps. As few authority datasets are available, our first step described in section 3.2 consists in building our own work entity datasets of manga, anime and video games from DBpedia. The second step described in section 3.3 consists in exploiting semantic relationships described by users to link works through several media together. 4 S. Zevio et al. We harvest DBpedia SPARQL endpoint as well as DBpedia Live[2] SPARQL endpoint and compare the results obtained with both. DBpedia Live SPARQL endpoint is the most up-to-date one as it is continuously synchronized with Wikipedia, while DBpedia SPARQL endpoint is only updated periodically. In a theoretical setting, we should expect more accurate results with DBpedia Live, assuming that knowledge available on DBpedia is growing larger and more ac- curate with the Wikipedia users’ contributions. 3.2 Datasets In DBpedia, a concept is described by an article. An article is defined as a RDF resource and additional information such as links to other articles are described as properties of the RDF resource. To determine than an article describes a manga, an anime or a video game, we exploit its rdf:type property. Indeed, an article describing a manga would have rdf:type property dbo:Manga. An arti- cle about an anime or a video game would have rdf:type property dbo:Anime or dbo:VideoGame respectively. We are building the manga, anime and video games datasets harvesting DBpedia and DBpedia Live with the SPARQL queries stuc- ture described in query 1.1. In table 1 we present the number of results obtained. 1 SELECT DISTINCT ? Concept WHERE { 2 ? Concept rdf : type dbo : Manga } Listing 1.1. SPARQL query : Manga Media Number of results (DBpedia) Number of results (DBpedia Live) Manga 3783 3928 Anime 4271 5014 Video games 28869 20807 Table 1. Datasets of manga, anime and video games obtained by harvesting DBpedia and DBpedia Live on 12-20-2017 3.3 Identification of works belonging to the same transmedia work In order to identify semantic relationships between works through several media, we exploit semantic links between articles describing works. For an example, an article describing an anime may have a dct:subject property which would apply to an article describing a manga. If that so, it would mean that this anime and this manga belong to the same transmedia work. We exploit any direct semantic relationship between articles about works through several media as well as some indirect ones. Queries are all derived from query structure shown in query 1.2. In table 2 we present the number of results obtained. 1 SELECT DISTINCT ? Manga ? Anime WHERE { Semantic Relationships from User-Generated Knowledge Bases 5 2 {? Manga rdf : type dbo : Manga . 3 ? Anime rdf : type dbo : Anime . 4 ? Anime ? p ? Manga } 5 UNION 6 {? Anime rdf : type dbo : Anime . 7 ? Manga rdf : type dbo : Manga . 8 ? Manga ? p ? Anime } 9 UNION 10 {? Anime rdf : type dbo : Anime . 11 ? Anime dct : subject ? Category . 12 ? Category skos : broader dbc : Wikipedia_categories_named_after_anime_and_manga_series . 13 ? Manga rdf : type dbo : Manga . 14 ? Manga dct : subject ? Category }} Listing 1.2. SPARQL query : Manga-Anime belonging to the same transmedia work Transmedia works Number of results (DBpedia) Number of results (DBpedia Live) Manga - Anime 764 696 Manga - Video games 864 191 Anime - Video games 411 135 Table 2. Couples of works through different media belonging to the same transmedia work obtained by harvesting DBpedia and DBpedia Live on 12-20-2017 3.4 Extraction of links between works and bibliographic data Digital libraries may compile informations about works through bibliographic data according to the FRBR model [3]. Therefore, it is a key issue to reconcile works to bibliographic data. In order to ease this reconciliation, we exploit se- mantic links between articles describing manga and list of chapters as well as anime and list of episodes according to the query structure shown in query 1.3. In table 3 we present the number of results obtained. 1 SELECT DISTINCT ? Manga ? List WHERE { 2 {? List dct : subject 3 dbc : L i s t s _ o f _ m a n g a _ v o l u m e s _ a n d _ c h a p t e r s . 4 ? Manga rdf : type dbo : Manga . 5 ? Manga ? p ? List } 6 UNION 7 {? Manga rdf : type dbo : Manga . 8 ? List dct : subject 9 dbc : L i s t s _ o f _ m a n g a _ v o l u m e s _ a n d _ c h a p t e r s . 10 ? List ? p ? Manga } 6 S. Zevio et al. 11 UNION 12 {? List dct : subject 13 dbc : L i s t s _ o f _ m a n g a _ v o l u m e s _ a n d _ c h a p t e r s . 14 ? List dct : subject ? Category . 15 ? Category skos : broader 16 dbc : Wikipedia_categories_named_after_anime_and_manga_series . 17 ? Manga rdf : type dbo : Manga . 18 ? Manga dct : subject ? Category }} Listing 1.3. SPARQL query : Manga - List of chapters Work - Bibliographic data Number of results (DBpedia) Number of results (DBpedia Live) Manga - List of chapters 244 247 Anime - List of episodes 266 275 Table 3. Couples of works and bibliographic data obtained by harvesting DBpedia and DBpedia Live on 12-20-2017 3.5 Evaluation As no gold standard is available for data about manga, anime nor video games and semantic links between them as far as we know from the literature, calculate a recall is impossible. It is difficult to judge whether or not our queries have a good coverage of the domain. A possible solution would be to manually collect all works related through several media for a certain number of known works then evaluate the recall of our method according to this restricted gold standard. However, even building a restricted gold standard requires a very high level of expertise and most experts would rely on user-generated knowledge bases at some point. Thus, we don’t propose a recall measure. Still, we can evaluate the accuracy of the queries and detect the relevance of the results returned. To estimate the relevance of our results, we conducted an evaluation consisting in randomly selecting 100 results for each query, asking two external experts of the domain to test the exactitude of each result. In the end, we obtain an accuracy as well as all errors raised. This information is available in tables 4, 5 and 6, along with error types encountered. We managed to obtain overall precise results. As expected, results obtained with DBpedia Live SPARQL endpoint are more precise than with DBpedia SPARQL endpoint, but with a surprisingly huge gap between them. From the error types and the precision drop with DBpedia SPARQL endpoint concerning the construction of the video games dataset and the identification of transmedia works, we can assert that the results are heavily dependant on the semantic data structure described by the users, which may potentially be inconsistent, as it’s human-generated data. Semantic Relationships from User-Generated Knowledge Bases 7 Query DBpedia Error types DBpedia Error types Live 1 Manga 94 % Something else than a 94 % Something else than a manga (Manga genre, novel, manga (Novels) film, wafer silicon, anime) 2 Anime 91 % Something else than an 94 % Something else than an anime (Drama, live action, anime (Drama, live action) director, studio, magazine, method of animation) 3 Video games 26 % Something else than a video 100 % game (Card game, board game, Superbowl, gaming platform) Table 4. Evaluation on query results obtained on 12-20-2017 (Work entity datasets of manga, anime and video games) Query DBpedia Error types DBpedia Error types Live 4 Manga - List of 91 % A list of chapters is associ- 85 % • A list of chapters is asso- chapters ated to something that is not ciated to something that a manga (Volume, mangaka is not a manga (Volume) (person)) • The list of chapters as- sociated does not cor- respond to the manga (Chapters of a manga from the same series) 5 Anime - List of 80 % 90 % • Something else than • Something else than an episodes an anime (DBpedia anime (Visual novel) page about ”Anime” in • The list of episodes does general) not correspond to the • The list of episodes does anime (episodes of an not correspond to the anime from the same or anime (episodes of an from different series) anime from the same se- ries) Table 5. Evaluation on query results obtained on 12-20-2017 8 S. Zevio et al. Query DBpedia Error types DBpedia Error types Live 6 Manga - Anime 8 % 89 % The manga and the anime • Something else than a do not belong to the same manga (Manga genre, transmedia work company, magazine, mangaka (person)) • Something else than an anime (DBpedia page about ”Anime” in general, list of episodes) • The manga and the anime do not belong to the same transmedia work 7 Manga - Video 41 % 77 % • Something else than a • Something else than a games manga (Manga series, manga (Novels) manga genre, company, • The manga and the video magazine, mangaka game do not belong to the (person)) same transmedia work • The manga and the video game do not belong to the same transmedia work 8 Anime - Video 32 % 95 % • Something else than an • Something else than an games anime (DBpedia page anime (Drama) about ”Anime” in gen- • The anime and the video eral, company, visual game do not belong to the novel) same transmedia work • Something else than a video game (Gaming hardware, video game genre, gaming platform) Table 6. Evaluation on query results obtained on 12-20-2017 Semantic Relationships from User-Generated Knowledge Bases 9 4 Discussion and conclusion With the help of very simple SPARQL queries, we managed to build work entity datasets of manga, anime and video games, which is hard to create manually by simple computational method without authority datasets. We prepared a fu- ture linkage between works and bibliographic data, by linking manga to their list of chapters and anime to their list of episodes. We also identified seman- tic relationships between manga, anime and video games, creating a semantic network that enables us to easily identify transmedia works. Although it is dif- ficult to estimate the coverage of the domain as no gold standard is available, we managed to obtain satisfying results in terms of accuracy as well as a solid number of results. As expected, we obtained better results harvesting DBpedia Live SPARQL endpoint, which is the most up-to-date one. We identified several limitations on this work. First, we use knowledge bases with user-generated content, which are not always exhaustive. Indeed, informa- tion may not be available in Wikipedia, or may be available in Wikipedia but not semantically described with accurateness in DBpedia. This limitation is closely related to the lack of authority data in this field, so it is a compromise that has to be made. Then, we obtained disparate results according to the SPARQL endpoint used. Therefore, using an up-to-date endpoint is a key feature. Indeed, consistency of user-generated data is not ensured. To pursue this work, an interesting research question would be to determine how to link data to records of publications in different countries. A comparison between English and Japanese resources would help us determine if multilingual processes would help us expand our results. Acknowledgements This work was supported by JSPS KAKENHI Grant Number 16H01754. References 1. DBpedia. https://dbpedia.org, [Online; accessed 26-July-2018] 2. DBpedia Live. http://live.dbpedia.org/sparql, [Online; accessed 26-July-2018] 3. FictionFinder: A FRBR-based Prototype for Fiction in WorldCat. https:// www.oclc.org/research/activities/fictionfinder.html, [Online; accessed 15- August-2018] 4. FRBR Work-Set Algorithm. https://www.oclc.org/research/activities/ frbralgorithm.html, [Online; accessed 15-August-2018] 5. MARC. http://www.loc.gov/marc/umb/um01to06.html, [Online; accessed 15- August-2018] 6. Media Art Database. https://mediaarts-db.bunka.go.jp/, [Online; accessed 26- July-2018] 7. Wikipedia. https://en.wikipedia.org, [Online; accessed 26-July-2018] 8. Steering Committee, T.R.: About RDA. http://rda-rsc.org/content/ about-rda, [Online; accessed 15-August-2018] 10 S. Zevio et al. 9. He, W., Mihara, T., Nagamori, M., Sugimoto, S.: Identification of Works of Manga Using LOD Resources: An Experimental FRBRization of Bibliographic Data of Comic Books. In: Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries. pp. 253–256. JCDL ’13 (2013) 10. Jett, J., Sacchi, S., Lee, J.H., Clarke, R.I.: A Conceptual Model for Video Games and Interactive Media. J. Assoc. Inf. Sci. Technol. 67(3), 505–517 (2016) 11. Kiryakos, S., Sugimoto, S., Nagamori, M., Mihara, T.: Aggregating Metadata from Heterogeneous Pop Culture Resources on the Web. In: International Conference on Dublin Core and Metadata Applications. pp. 65–74 (2017) 12. McDonough, J., Kirschenbaum, M., Reside, D., Fraistat, N., Jerz, D.: Twisty Little Passages Almost All Alike: Applying the FRBR Model to a Classic Computer Game. Digital Humanities Quarterly 4(2) (2010) 13. IFLA Study Group on the Functional Requirements for Bibliographic Records: Functional Requirements for Bibliographic Records. https://www.ifla.org/ publications/functional-requirements-for-bibliographic-records, [On- line; accessed 15-August-2018] 14. Takhirov N., Duchateau F., A.T.: Linking FRBR Entities to LOD through Se- mantic Matching. Research and Advanced Technology for Digital Libraries. TPDL 2011. Lecture Notes in Computer Science 6966, 69–76 (2011) 15. Takhirov N., Duchateau F., A.T.: Supporting FRBRization of Web Product De- scriptions. Research and Advanced Technology for Digital Libraries. TPDL 2011. Lecture Notes in Computer Science 6966, 284–295 (2011) 16. Tillett, B.: What is FRBR? A Conceptual Model for the Bibliographic Universe. Australian Library Journal 54(1), 24–30 (2005) 17. Vukadin, A.: Bits and Pieces of Information: Bibliographic Modeling of Transme- dia. Cataloging & Classification Quarterly 52(3), 285–302 (2014)