Estimation of Character Diagram from Open Movie Database using Markov Logic Network

Estimation of Character Diagram from Open Movie Database using Markov Logic Network YutaOhwatari y-ohwatari@ohsuga.is.uec.ac.jp Graduate School of Information Systems University of Electro-Communications

Tokyo Japan

TakahiroKawamura kawamura@ohsuga.is.uec.ac.jp Graduate School of Information Systems University of Electro-Communications

Tokyo Japan

YuichiSei sei@is.uec.ac.jp Graduate School of Information Systems University of Electro-Communications

Tokyo Japan

YasuyukiTahara tahara@is.uec.ac.jp Graduate School of Information Systems University of Electro-Communications

Tokyo Japan

AkihikoOhsuga ohsuga@uec.ac.jp Graduate School of Information Systems University of Electro-Communications

Tokyo Japan

Estimation of Character Diagram from Open Movie Database using Markov Logic Network 2EC7D4DEF319AC905EF64A8ADEFD12FD GROBID - A machine learning software for extracting information from scholarly documents Markov Logic Network Semantic Analysis Open Movie Database

In this paper, we propose the estimation method of interpersonal relationships of characters from movie script databases on the Web using Markov Logic Network. By using Markov Logic Network, we can infer while allowing the violation of rules. In experiments, we confirmed that our proposed method can estimate favors between the characters in a movie with a precision of 69.8%.

Introduction

Every year, a large number of movies have been released. If a user want to quickly know about a movie, he/she will see the summary of the movie. Therefore, It is effective summarization of a movie is required in order for better understanding of the movie.

An overview of our proposed method is illustrated in Figure 1. Our method is separated into the estimation of interpersonal relationships and the generation of character diagrams.

First, we prepared script data for learning and inferring by extracting who speak what to whom from a movie database. Then, we estimate the sentiment

!"#"$%&'&'(! )*+$+)#"$,!&+($+%! -.,, /"0(0,123-45! 67$,8"+$'&'(! ($79'!,67$%98+! 8"+$'&'(! &'6"$$"!, &'#"$:"$;7'+8,$"8+<7';*&:! :$":$7)";;&'(! 67$,&'6"$$&'(! &'6"$$&'(! 2+$=7>,?7(&),@"#A7$=! ($79'!,67$%98+! B8)*"%C! D;<%+<7',B("'#,76,E*+$+)#"$,-&+($+%!

EBFGB1@,H,I";J,;&$K, LM1NOP@,H,/Q0P5,G"88, #*"%,A",A&;*,#7, 47+$!,+#,7')"0, EBFGB1@,H,I";J,;&$0! ?&=";/LM1OP@JEBFGB1@5, R0RRSRTUS, ?&=";/LM1OP@JLM1OP@5, R0RVRRTU, ?&=";/LM1OP@J@MGD5, R0RRTRTUW! Fig. 1. Method workflow polarity for lines in the script and the favorable impression between a speaker and a listener in a movie using Markov Logic Network. Finally, we generate the character diagram of a movie from the estimated interpersonal relationships.

A first-order knowledge base can be seen as a set of hard constraints on the set of possible worlds. However, the solution in the real world is often on the set of impossible worlds. In contrast, Markov Logic Network solves this problem by associating weight that reflects how strong a constraint is with each formulas. Also, it is laborious to construct Markov Networks. Markov Logic Network can be viewed as a template for constructing Markov Networks. Markov Logic Network (MLN) is a probabilistic extension of a finite first-order logic [4], which makes up the disadvantages of Markov Networks and a first-order logic.

Note that we used the learning and inference algorithms provided in the open-source Alchemy1 as an implementation of the MLN in this paper.

Related Work

Tanaka et al [7] presents interpersonal relationships extracted from sentence structures as a summary of a story. We considered that it is effective to present the relationships of characters as a summary. On the other hand, analysis of e-mails [1] and estimation from co-occurrence of the name [2] are studies of estimating the relationships of persons in the real world. However, it has not been studied about the estimation of interpersonal relationships of fictional characters.

There are many studies using MLN, for example, entity resolution [5], information extraction [3]. These studies focus on global constraints, and built a model by using MLN. We also targets a text and extracts infomation on global constraints.

Defined Rules

We defined rules to estimate interpersonal relationships for MLN. These rules determine the sentiment polarity for lines in the script using sentiment polarity for the word and favor between characters using the sentiment polarity for lines. To use sentiment polarity of words, we incorporated as the Semantic Orientations of Words Dictionary that is built by Takamura et al [6]. This assigns a real value in the range from -1 to +1 to where the words assigned with values close to -1 are supposed to be negative, and the words assigned with values close to +1 are supposed to be positive. Vocabulary was extracted from WordNet2 .

In this paper, we limited to two-valued attribute of positive(+1) and negative(−1). A observed predicate is a predicate with all arguments given by inferring and training. A hidden predicate is a predicate with an argument not given by inferring but given by training. Observed predicates and hidden predicates in this paper are shown in Table 1. We describe some of the logical rules for each script line below. t and l is variable. A constant is enclosed in double quotes. Underscore means an arbitrary value. If (+) gets attached to the front of the variable, it is replaced by all the constants that is deployed from the actual data (grounding).

W ord(t, l, +w) ∧ W pol(+w, +p) ⇒ Lpol(t, +p) Line(t, +sp, +li) ∧ Lpol(t, "P ") ⇒ Likes(+sp, +li) Line(t, +sp, +li) ∧ Lpol(t, "N ") ⇒ ¬Likes (+sp, +li) . . .

Experiment on Relation Extraction

Datasets

In the experiment, we used movie script data from IMSDb: The Internet Movie Script Database3 on the Web. The title of movies used in the experiment are Back to the Future (1985), Good Will Hunting (1997), Harry Potter And The Sorcerer's Stone (2001), The Lord of the Rings The Fellowship of the Ring (2001), and Star Wars Episode I The Phantom Menace (1999). The average number of lines and characters are 704.6 and 42.6, respectively.

Setting

We used a movie for testing, and the remaining 4 movies as training data. We treated as true above the mean value of the probability, because estimation results are expressed in a probability. Note, we ask the person for a description of Likes predicates in the training data that is familiar with the movies and has seen actually.

Result

The experimental results are shown in Table 2. The training time was about 19 hours in total, and the inferring time was about 3 hours in total. As a result, recall is lower than precision. In addition, Figure 2 shows an example of the generated character diagram from the estimated interpersonal relationships. In this figure, a node represents a person, an edge represents a relationship. The information with edge shows the estimated probability of the predicate Likes() and the mean of the probability (like or not like). A dashed edge means false estimation. This figure generally represents the interpersonal relationships of Star Wars Episode I The Phantom Menace (1999).

Conculusion

In this paper, using MLN on the movie script database, we estimated the sentiment polarity of script lines and the interpersonal relationships of the characters in a movie. In the experiments, we confirmed that our proposed method estimated favors between the characters in a movie with a precision of 69.8%. In the future, we will improve the model to achieve the higher accuracy.

Table 1 .1Observed and hidden predicatesPredicateDescriptionObservedLine(text, speaker, listener) speaker speak text to listenerpredicatesWord(text, position, word) word in text and the position is positionWpol(word, pol)The sentiment polarity of word is polHiddenLpol(text, pol)The sentiment polarity of text is polpredicatesLikes(person, person)Favor

Table 2 .2Estimated relationships and the number of grounded rules Part of character diagram generated from the estimated relationshipPrecision RecallF-measure ground clausesmean69.75245.33253.39246,2034#,56&7#8*!9:9B<9<;C?*%$+A&9:99D9<=2?"@5&*%$+A&9:99;9<=>?"@5&*%$+A!9:9BC9<;<?*%$+A! !"#$%"&'$()#*$+,!9:9B=9<;B?*%$+A!-#./0&!/%.#*#!9:9B>9<;;?*%$+A!12-3!Fig. 2.

http://alchemy.cs.washington.edu/ http://wordnet.princeton.edu http://www.imsdb.com/

Acknowledgements

This work was supported by JSPS KAKENHI Grant Numbers 24300005, 26330081, 26870201.

Friends and neighbors on the web LAAdamic EAdar Social networks 25 3 2003 Social network extraction from the web information YMatsuo HTomobe KHashida HNakajima MIshizuka Transactions of the Japanese Society for Artificial Intelligence 20 1 2005 Joint inference in information extraction HPoon PDomingos AAAI 7 2007 Markov logic networks MRichardson PDomingos Machine learning 62 1-2 2006 Entity resolution with markov logic PSingla PDomingos ICDM'06. Sixth International Conference on IEEE 2006. 2006 Data Mining Extracting semantic orientations of words using spin model HTakamura TInui MOkumura Proc. the 43rd Annual Meeting on Association for Computational Linguistics the 43rd Annual Meeting on Association for Computational Linguistics 2005 Association for Computational Linguistics Interactive narrative summarization STanaka MOkabe ROnai Workshop on Interactive Systems and Software 2011