Discovering Mise-en-scène in Movies by Analyzing Scripts? O-Joun Lee[0000−0001−8921−5443] and Jin-Taek Kim?? Future IT Innovation Laboratory, Pohang University of Science and Technology 77, Cheongam-ro, Nam-gu, Pohang-si, Gyeongsangbuk-do, Republic of Korea 37673 {ojlee112358,jintaek}@postech.ac.kr Abstract. This study aims to analyze mise-en-scène in movies. Although the mise-en-scène includes all the entities that appear on the scene, we focus on metaphorical meanings of the entities. When entities are expressions of the same metaphorical symbol, lexical meanings of the entities will be relevant to each other. Also, if a metaphorical symbol is significant for storytelling, the symbol will appear on most of the scenes. Therefore, we find groups of entities, which share similar lexical meanings and appear allover the movie. As a preliminary study, we have applied these approaches on terms in movie scripts with simple natural language processing techniques. Although we have not conducted evaluation yet, we anticipate that the proposed method will be helpful for analyzing artistic and topical intentions of movie directors. Keywords: Computational Narrative Analysis, Mise-en-scène, Movie Script Analysis, Metaphorical Symbol. 1 Introduction Mise-en-scène literally indicates things located on the scene [2,11]. Simply speaking, this concept covers everything that appears on the scene; e.g., characters, props, backgrounds, etc. Although a few existing studies [1,4] have been conducted on the mise-en-scène in films, they focus on mise-en-scène as physical expressions; e.g., lighting, composition, camera work, etc. These studies can not cover meanings of the expressions, which can be metaphorical symbols located by directors. In a movie ‘Parasite (2019),’ vertical movements (e.g., walking up the stairs) signify changes in social classes. In this study, we deal with the mise-en-scène as the metaphorical symbol. We cannot interpret meanings of each symbol but attempt to discover which entities in a film have metaphorical intentions. Thus, different from our previous studies [5,6,7,8,10], which focus on interactions between characters, this study concentrates on entities that appears ? Copyright © by the paper’s authors. Use permitted under Creative Commons License At- tribution 4.0 International (CC BY 4.0). In: J.-T. Kim, J. J. Jung, E. You, O.-J. Lee (eds.): Proceedings of the 1st International Workshop on Computational Humanities and Social Sci- ences (Computing4Human 2020), Pohang, Republic of Korea, 15-February-2020, published at http://ceur-ws.org ?? Corresponding author. 12 O.-J. Lee and J.-T. Kim in scenes. Also, as a preliminary study, we analyze terms in movie scripts rather than detect entities in videos. To discover metaphorical symbols, we first preprocess the movie script. Based on formality of the script, we extract terms that were used for describing backgrounds, events, or characters. Since the metaphorical symbol is expressed by various terms, we cluster terms in the script according to their semantic relevancy by using WordNet [14]. The clusters of terms are candidates of the metaphorical symbol. Then, we attempt to distinguish the metaphorical symbol from other clusters of terms that were used in their lexical meanings. 2 Movie Scripts Movie scripts are highly formalized documents, as displayed in Fig. 1. Various in- formation (e.g., scene headings, action descriptions, names of characters, dialogues, background descriptions, etc.) is written according to a consistent convention. When the metaphorical symbol is placed as spatial backgrounds (e.g., stairs or basements), it will be depicted in the background description or even in the scene heading. If the spatial background is related to behaviors of characters, it can be described in the action description (e.g., walking up the stairs). Also, if the symbol is a prop, it will be depicted on the dialogue or action description. Thus, we extract terms from scene headings, background descriptions, action de- scriptions, and dialogues in movie scripts. Then, we compose sets of terms that occurred in each scene. It is difficult to find which terms have metaphorical meanings. However, the metaphorical meaning also originates from the lexical meaning. Therefore, terms related to a metaphorical symbol will have similar lexical meanings and appear on most of the scenes. 3 Discovering Metaphorical Symbols However, the metaphorical symbols are normally abstract concepts, and terms are annotations of their expressions in the background, dialogue, and behavior. In the case of ‘Parasite (2019),’ Joon-ho Bong uses vertical movements as metaphors, and the movements were expressed mainly by stairs and basements. Thus, terms in the script are difficult to be directly metaphorical symbols. We have to collect terms that frequently appear. Then, we discover groups of terms that can be used for expressing the same metaphorical meaning. If a set of terms are expressions of the same metaphorical symbol, they will be related in terms of their lexical meanings. For example, the stairs and basements commonly have meanings related to heights. Although the expressions can have multi-layered metaphors, the film is one of the most commercial and popular media. To reveal the semantic relevancy, we use the WordNet [14]. The WordNet is an ontological database of English words, which contains not only definitions of the words but also relations between the words. These relations include hypernyms, hyponyms, coordinate terms, meronyms, holonyms, troponyms, etc. Discovering Mise-en-scène in Movies by Analyzing Scripts 13 7 INT. ELEVATOR , APARTMENT BUILDING - CONTINUOUS 7 // Scene heading Joker steps onto the wheezing elevator , harsh fluorescent lights , graffiti on the walls. As the door closes , he hears -- // Action and background description SOPHIE (OS) // Name of a speaker Wait !! // Dialogue He puts his foot out with some panache to stop the closing door -- He’s a romantic at heart. Ding. // Action description And SOPHIE DUMOND (late 20’s), tired eyes , hands filled with grocery bags , steps onto the elevator with GIGI , her 5-year - old daughter . SOPHIE Thank you. ( realizing ) Of course it’s you ,-- everyone else in this building is just so fucking rude. Joker nods " thanks ." Holds his breath , hoping he doesn ’t start to laugh. Floors dinging as the elevator rises. // Background description Fig. 1: A part of a script of ‘Joker (2019).’ For the terms in a movie script, we conduct stemming and removing stop words (e.g., articles, pronouns, auxiliary verbs, punctuation marks, etc.). Then, we measure distances between the terms by using the WordNet. The distance is measured with two approaches. First, we use length of the shortest path between two terms on the WordNet. The relations in the WordNet indicate semantic relationships between terms. Thus, if we can reach from a term to another term in fewer hops, we can assume that the two terms are semantically more related. Second, a few terms are not directly related in their semantics, while they are conceptually relevant. Although we imagine ‘height’ from both of stairs and basements, the two terms are not much close on the WordNet. Thus, we embed the terms in movie script based on terms in their definition by using SkipGram in Word2Vec [13,5]. We do not conduct SkipGram based on sentences in the script, since the script is close to the everyday language rather than pellucid writings. The vector representations of terms enable us to conveniently compare semantics of the terms. In this study, we use the 14 O.-J. Lee and J.-T. Kim Euclidean distance to compare the vectors. When a term has multiple homonyms, we are difficult to determine meaning of the term in the current context. Thus, we measure distances for all the homonyms and aggregate the distances by the arithmetic mean. After normalizing these distances into [0, 1], we cluster these terms using the k-NN (Nearest Neighborhood) clustering. The cluster model is composed to minimize semantic distances between terms in each cluster. We determine the number of clusters (k) by maximizing average distances between clusters (external adjacency) and minimizing average distances among terms in each cluster (internal compactness). However, not all the clusters will correspond to metaphorical symbols. Most of them might be related to genres, insignificant subjects, or daily vocabularies. We distinguish metaphorical symbols from other clusters, which consists of terms used in their lexical meanings, by using a few heuristics. First, the metaphorical symbol is to deliver the director’s intention in a euphemistic and sophisticated way. Thus, the symbol will be used in most of the scenes in a movie. Second, stories in narrative multimedia are led by conflicts of characters (mainly, around the protagonists) [12,16]. The conflict is a process that the characters try to recover normality of their lives against oppressive situations. This process forces the characters to change themselves, and the director tells artistic or topical intentions through the change. Thus, in lots of movies, the metaphorical symbols allude to purposes, personalities, or situations of characters [2,11]. And, the characters, which are implied by the symbols, might be significant characters. Therefore, we assess each term cluster based on how many scenes terms in the cluster appeared in. Also, to consider significance of characters, we use node centrality of the characters on the character networks, as with our previous studies [3,9,10,15]. This can be formulated as: S(Cn ) = ∑ I sα,l ,Cn × C (Cn )  (1) ∀sα,l ∈Cα where sα,l ∈ Cα indicates the l-th scene in a movie (Cα ), I(sα,l ,Cn ) is an indicator function for whether terms in Cn occurred on sα,l , and C (Cn ) refers to the summation of node centrality for characters that appeared in sα,l . Then, S(Cn ) is the significance of the n-th term cluster. For composing character networks, CharNetBuilder1 is used, and scenes in the script are segmented by the scene headings. The centrality of characters is measured by an average of the degree, betweenness, and closeness centrality. Finally, we can choose term clusters which have distinctively higher significance than the others, by using thresholds or clustering. 4 Conclusion This study has proposed a method for discovering mise-en-scène in movies by analyzing movie scripts, by concentrating on the mise-en-scène as the metaphorical symbol. The proposed method is significant, since the existing studies on computationally analyzing mise-en-scène have only focused on physical expressions rather than their meanings. However, this study also have various limitations. First, since we have not conducted 1 https://github.com/O-JounLee/CharNetBuilder Discovering Mise-en-scène in Movies by Analyzing Scripts 15 evaluation for the proposed method yet, this approach is still remaining as a theoretical methodology. Also, narrative multimedia (including movies) employ various techniques spanning one or more physical expressions (i.e., visual, audible, and verbal expressions are compositely used). Thus, to discover the mise-en-scène in higher accuracy, analyzing and synchronizing multi-modal data is necessary, as with our previous study [9]. Acknowledgment This research was supported by the MSIT (Ministry of Science and ICT), Korea, under the ICT Consilience Creative program (IITP-2019-2011-1-00783) supervised by the IITP (Institute for Information & communications Technology Planning & Evaluation). References 1. Deldjoo, Y., Elahi, M., Cremonesi, P., Garzotto, F., Piazzolla, P.: Recommending movies based on mise-en-scene design. In: Kaye, J., Druin, A., Lampe, C., Morris, D., Hourcade, J.P. (eds.) Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems (CHI EA 2016). pp. 1540–1547. ACM Press, San Jose, CA, USA (May 2016). https://doi.org/10.1145/2851581.2892551 2. Deleyto, C.: Focalisation in film narrative. Atlantis 13(1/2), 159–177 (Nov 1991) 3. Jung, J.E., Lee, O.J., You, E.S., Nam, M.H.: A computational model of transmedia ecosystem for story-based contents. Multimedia Tools and Applications 76(8), 10371–10388 (Apr 2017). https://doi.org/10.1007/s11042-016-3626-5 4. Kvisgaard, A., Klem, S.O., Nielsen, T.L., Rafferty, E.I., Nilsson, N.C., Hoeg, E.R., Nordahl, R.: Frames to zones: Applying mise-en-scène techniques in cinematic virtual reality. In: Proceedings of the 5th IEEE Workshop on Everyday Virtual Reality (WEVR 2019). IEEE, Osaka, Japan (Mar 2019). https://doi.org/10.1109/wevr.2019.8809592 5. Lee, O.J.: Learning Distributed Representations of Character Networks for Computational Narrative Analytics. Ph.D. thesis, Chung-Ang University, Seoul, Republic of Korea (Aug 2019) 6. Lee, O.J., Jo, N., Jung, J.J.: Measuring character-based story similarity by analyzing movie scripts. In: Jorge, A.M., Campos, R., Jatowt, A., Nunes, S. (eds.) Proceedings of the 1st Workshop on Narrative Extraction From Text (Text2Story 2018) co-located with the 40th European Conference on Information Retrieval (ECIR 2018). CEUR Workshop Proceedings, vol. 2077, pp. 41–45. CEUR-WS.org, Grenoble, France (Mar 2018) 7. Lee, O.J., Jung, J.J.: Explainable movie recommendation systems by using story-based similarity. In: Said, A., Komatsu, T. (eds.) Joint Proceedings of the ACM IUI 2018 Workshops co-located with the 23rd ACM Conference on Intelligent User Interfaces (ACM IUI 2018). CEUR Workshop Proceedings, vol. 2068. CEUR-WS.org, Tokyo, Japan (Mar 2018) 8. Lee, O.J., Jung, J.J.: Character network embedding-based plot structure discovery in narrative multimedia. In: Akerkar, R., Jung, J.J. (eds.) Proceedings of the 9th International Conference on Web Intelligence, Mining and Semantics (WIMS 2019). pp. 15:1–15:9. ACM, Seoul, Republic of Korea (Jun 2019). https://doi.org/10.1145/3326467.3326485 9. Lee, O.J., Jung, J.J.: Integrating character networks for extracting narratives from mul- timodal data. Information Processing and Management 56(5), 1894–1923 (Sep 2019). https://doi.org/10.1016/j.ipm.2019.02.005 10. Lee, O.J., Jung, J.J.: Modeling affective character network for story analytics. Future Genera- tion Computer Systems 92, 458–478 (Mar 2019). https://doi.org/10.1016/j.future.2018.01.030 16 O.-J. Lee and J.-T. Kim 11. Martin, A.: Mise en Scène and Film Style: From Classical Hollywood to New Media Art. Palgrave Close Readings in Film and Television, Palgrave Macmillan UK (2014). https://doi.org/10.1057/9781137269959 12. McKee, R.: Story: Substance, Structure, Style and the Principles of Screenwriting. Harper- Collins, New York, NY, USA (Nov 1997) 13. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C.J.C., Bottou, L., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26: Proceedings of 27th Annual Conference on Neural Information Processing Systems (NIPS 2013). pp. 3111–3119. Curran Associates, Inc., Lake Tahoe, Nevada, US (Dec 2013) 14. Miller, G.A.: WordNet: a lexical database for english. Communications of the ACM 38(11), 39–41 (Nov 1995). https://doi.org/10.1145/219717.219748 15. Tran, Q.D., Hwang, D., Lee, O.J., Jung, J.E.: Exploiting character networks for movie summarization. Multimedia Tools and Applications 76(8), 10357–10369 (Apr 2017). https://doi.org/10.1007/s11042-016-3633-6 16. Truby, J.: The anatomy of story: 22 steps to becoming a master storyteller. Farrar, Straus and Giroux (Oct 2008)