Introduction

Estimation of Character Diagram from Open Movie Database using Markov Logic Network

Yuta Ohwatari

Takahiro Kawamura

kawamura@ohsuga.is.uec.ac.jp 0

Yuichi Sei

sei@is.uec.ac.jp 0

Yasuyuki Tahara

tahara@is.uec.ac.jp 0

Akihiko Ohsuga

ohsuga@uec.ac.jp 0 0 Graduate School of Information Systems, University of Electro-Communications , Tokyo , Japan

In this paper, we propose the estimation method of interpersonal relationships of characters from movie script databases on the Web using Markov Logic Network. By using Markov Logic Network, we can infer while allowing the violation of rules. In experiments, we con rmed that our proposed method can estimate favors between the characters in a movie with a precision of 69.8%.

Markov Logic Network Semantic Analysis Open Movie Database

Introduction

Every year, a large number of movies have been released. If a user want to quickly know about a movie, he/she will see the summary of the movie. Therefore, It is effective summarization of a movie is required in order for better understanding of the movie.

An overview of our proposed method is illustrated in Figure 1. Our method is separated into the estimation of interpersonal relationships and the generation of character diagrams.

First, we prepared script data for learning and inferring by extracting who speak what to whom from a movie database. Then, we estimate the sentiment ($79'!,67$%98+! 8"+$'&'(!

&'6"$$&'(! 2+$=7>,?7(&),@"#A7$=!

B8)*"%C!

!"#"$%&'&'(! )*+$+)#"$,!&+($+%! D;<%+<7',B("'#,76,E*+$+)#"$,-&+($+%! polarity for lines in the script and the favorable impression between a speaker and a listener in a movie using Markov Logic Network. Finally, we generate the character diagram of a movie from the estimated interpersonal relationships.

A rst-order knowledge base can be seen as a set of hard constraints on the set of possible worlds. However, the solution in the real world is often on the set of impossible worlds. In contrast, Markov Logic Network solves this problem by associating weight that re ects how strong a constraint is with each formulas. Also, it is laborious to construct Markov Networks. Markov Logic Network can be viewed as a template for constructing Markov Networks. Markov Logic Network (MLN) is a probabilistic extension of a nite rst-order logic[ 4 ], which makes up the disadvantages of Markov Networks and a rst-order logic.

Note that we used the learning and inference algorithms provided in the open-source Alchemy 1 as an implementation of the MLN in this paper. 2

Related Work

Tanaka et al[ 7 ] presents interpersonal relationships extracted from sentence structures as a summary of a story. We considered that it is effective to present the relationships of characters as a summary. On the other hand, analysis of e-mails[ 1 ] and estimation from co-occurrence of the name[ 2 ] are studies of estimating the relationships of persons in the real world. However, it has not been studied about the estimation of interpersonal relationships of ctional characters.

There are many studies using MLN, for example, entity resolution[ 5 ], information extraction[ 3 ]. These studies focus on global constraints, and built a model by using MLN. We also targets a text and extracts infomation on global constraints. 3

De ned Rules

We de ned rules to estimate interpersonal relationships for MLN. These rules determine the sentiment polarity for lines in the script using sentiment polarity for the word and favor between characters using the sentiment polarity for lines. To use sentiment polarity of words, we incorporated as the Semantic Orientations of Words Dictionary that is built by Takamura et al[ 6 ]. This assigns a real value in the range from -1 to +1 to where the words assigned with values close to -1 are supposed to be negative, and the words assigned with values close to +1 are supposed to be positive. Vocabulary was extracted from WordNet2.

In this paper, we limited to two-valued attribute of positive(+1) and negative( 1). A observed predicate is a predicate with all arguments given by inferring and training. A hidden predicate is a predicate with an argument not given by inferring but given by training. Observed predicates and hidden predicates in this paper are shown in Table 1.

1 http://alchemy.cs.washington.edu/ 2 http://wordnet.princeton.edu

Line(text, speaker, listener) speaker speak text to listener Word(text, position, word) word in text and the position is position Wpol(word, pol) The sentiment polarity of word is pol Lpol(text, pol) Likes(person, person)

The sentiment polarity of text is pol

Favor

We describe some of the logical rules for each script line below. t and l is variable. A constant is enclosed in double quotes. Underscore means an arbitrary value. If (+) gets attached to the front of the variable, it is replaced by all the constants that is deployed from the actual data (grounding).

W ord(t; l; +w) ^ W pol(+w; +p) ) Lpol(t; +p) Line(t; +sp; +li) ^ Lpol(t; "P ") ) Likes(+sp; +li) Line(t; +sp; +li) ^ Lpol(t; "N ") ) :Likes(+sp; +li) . . . 4

Experiment on Relation Extraction Datasets

In the experiment, we used movie script data from IMSDb: The Internet Movie Script Database3 on the Web. The title of movies used in the experiment are Back to the Future (1985), Good Will Hunting (1997), Harry Potter And The Sorcerer's Stone (2001), The Lord of the Rings The Fellowship of the Ring (2001), and Star Wars Episode I The Phantom Menace (1999). The average number of lines and characters are 704.6 and 42.6, respectively.

Setting Result

We used a movie for testing, and the remaining 4 movies as training data. We treated as true above the mean value of the probability, because estimation results are expressed in a probability. Note, we ask the person for a description of Likes predicates in the training data that is familiar with the movies and has seen actually.

The experimental results are shown in Table 2. The training time was about 19 hours in total, and the inferring time was about 3 hours in total. As a result, recall is lower than precision. In addition, Figure 2 shows an example of the generated character diagram from the estimated interpersonal relationships. In this gure, a node represents a person, an edge represents a relationship. The

3 http://www.imsdb.com/

information with edge shows the estimated probability of the predicate Likes() and the mean of the probability (like or not like). A dashed edge means false estimation. This gure generally represents the interpersonal relationships of Star Wars Episode I The Phantom Menace (1999). mean -#./0&!/%.#*#! !"#$%"&'$()#*$+,!

9:9B=9<;B?*%$+A! 9:9BC9<;<?*%$+A! 9:9B>9<;;?*%$+A! 12-3!

Conculusion

In this paper, using MLN on the movie script database, we estimated the sentiment polarity of script lines and the interpersonal relationships of the characters in a movie. In the experiments, we con rmed that our proposed method estimated favors between the characters in a movie with a precision of 69.8%. In the future, we will improve the model to achieve the higher accuracy.

Acknowledgements

This work was supported by JSPS KAKENHI Grant Numbers 24300005, 26330081, 26870201.

1. Adamic , L.A. , Adar , E.: Friends and neighbors on the web . Social networks 25(3) , 211 { 230 ( 2003 )

2. Matsuo , Y. , Tomobe , H. , Hashida , K. , Nakajima , H. , Ishizuka , M. : Social network extraction from the web information . Transactions of the Japanese Society for Arti cial Intelligence 20 ( 1 ), 46 { 56 ( 2005 )

3. Poon , H. , Domingos , P. : Joint inference in information extraction . In: AAAI . vol. 7 , pp. 913 { 918 ( 2007 )

4. Richardson , M. , Domingos , P. : Markov logic networks . Machine learning 62(1-2) , 107 { 136 ( 2006 )

5. Singla , P. , Domingos , P. : Entity resolution with markov logic . In: Data Mining , 2006 . ICDM' 06 . Sixth International Conference on. pp. 572 { 582 . IEEE ( 2006 )

6. Takamura , H. , Inui , T. , Okumura , M. : Extracting semantic orientations of words using spin model . In: Proc. the 43rd Annual Meeting on Association for Computational Linguistics . pp. 133 { 140 . Association for Computational Linguistics ( 2005 )

7. Tanaka , S. , Okabe , M. , Onai , R.: Interactive narrative summarization . Workshop on Interactive Systems and Software pp. 06 { 01 { 06 { 6 ( 2011 )