<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Explainable Movie Recommendation Systems by using Story-based Similarity</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>O-Joun Lee</string-name>
          <email>concerto34@cau.ac.kr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jason J. Jung</string-name>
          <email>j3ung@cau.ac.kr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Engineering, Chung-Ang University</institution>
          ,
          <addr-line>Dongjak-gu, Seoul</addr-line>
          ,
          <country>Republic of Korea</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The goal of this paper is to provide a story-based explanation for movie recommendation systems, achieved by a multiaspect explanation and narrative analysis methods. We explain how and why particular movies are similar based on following two aspects: (i) composition of movie characters and (ii) interactions among the characters. These aspects correspond to story-based features of the movies that are extracted from character networks (i.e., social networks among the characters). By using the story-based features, we can explain the reason why two arbitrary movies are similar or not. We anticipate that the proposed method could improve the explainability of the recommender systems for movies.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>Various online services have been employing recommender
module to provide users with the most relevant items.
However, with only a list of recommended items, it is difficult for
the users to understand why such items are selected. The users
should spend additional resources (mostly time or money) for
identifying whether the recommended items are really
preferable. The problem becomes even worse for recommending
narrative works (e.g., movies, TV series, novels, and so on).
For example, we have experienced giving up watching TV
series after a few episodes.</p>
      <p>
        Hence, explaining the reason on the recommendation has been
regarded as an important research issue. There have been
various studies [
        <xref ref-type="bibr" rid="ref2 ref7">7, 2</xref>
        ] on building ‘explainable’ recommender
systems.
*Corresponding author.
However, the previous studies could not identify what the
users will gain or feel from the recommended items. Most
of the studies only took into account the ‘scrutability’ for the
recommended items. Moreover, content of the items were not
considered.
      </p>
      <p>For recommending the narrative works, in this work, we
assume that the content of the items directly affect the users’
preference. We focus on analyzing and exploiting the major
characteristics of the content (such as drawing styles of comics
or stories of movies).</p>
      <p>We have conducted a simple user survey among 97 users of
‘webtoon’, which is a novel media distributing comics through
the web. The survey simply consisted of one question, which
allowed plural responses: “What are criteria that affect your
preferences for webtoons?”. Most of the users wrote two
criteria: stories (98.96%) and drawing styles (97.93%); 96.90%
of the users selected both the criteria. Also, we interviewed
Lehzin Comics1, which is one of the major webtoon publishers
in Korea. For a question: “Why you do not use recommender
engines for your platform?”, they answered that users mostly
consume webtoons within a few limited genres and drawing
styles.</p>
      <p>Based on the results, we have found out that the following
patterns can be used for explaining the recommendations.
stories contained in the narrative works
how the stories are physically described.</p>
      <p>The goal of this study is to improve the explainability of the
recommender systems by using the similarity among the
stories. Nevertheless, as an ongoing study, we limited our target
domain into the movies. Also, using character networks (i.e.,
social networks among the characters), we preserved the
expandability of the proposed method for other types of narrative
works.</p>
      <p>EXPLAINING STORY-BASED SIMILARITY
Expected results of the proposed method are similar to what
Netflix2 is already providing, as displayed in Fig. 1. Netflix
recommends sets of movies or TV series with some reasons;
e.g., “Because you watched Madam Secretary.”
© 2018. Copyright for the individual papers remains with the authors.
Copying permitted for private and academic purposes. ExSS ’18, March
11, Tokyo, Japan.</p>
      <sec id="sec-1-1">
        <title>1https://www.lezhin.com/ko/ 2https://www.netflix.com/browse</title>
        <p>However, we targeted on more detailed explanations than
Netflix’s. Our proposed method focused on providing causal
evidence based on the movies’ stories. For example, when we
recommend “Cinderella” for the users, we can say “Cinderella
is similar with Snow White with focus on relationships among
characters appearing in the movies.”
To achieve these goals, this study aimed for following
objectives: (i) discovering story-based features, (ii) estimating
story-based similarity among the movies, (iii) providing
explainable recommendation on the basis of the story-based
similarity, and (iv) making the proposed method expendable
to other media.</p>
        <p>For the former two objectives, we attempted to discover
features with two aspects: composition of the characters and
interactions among the characters. Based on the story-based
features, we developed measurements to display the
differences between two arbitrary stories. Also, with focus on the
expandability, data sources of the features were limited within
the character network.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Character Network</title>
      <p>
        Our previous studies [
        <xref ref-type="bibr" rid="ref1 ref3 ref4 ref5">1, 5, 4, 3</xref>
        ] have proposed SNA-based
methods for computationally analyzing the movies’ stories.
We modeled the stories by using the character network that
was defined as follows;
      </p>
      <p>DEFINITION 1 (CHARACTER NETWORK). Suppose
that N is the number of characters that appeared in a movie,
Ca . When N(Ca ) indicates a character network of Ca ,
N(Ca ) can be described as a matrix 2 RN N . It consists
of N N components which are social affinities among the
characters where, ai; j is the social affinity of ci for c j when
Ca is an universal set of characters that appeared in Ca and
ci is an i-th element of Ca .</p>
      <p>In this study, we used frequency of the dialogues among the
characters as the social affinity among them. The dialogues
were extracted from the movies’ scripts that were collected
from the Internet Movie Script Database (IMSDb) 3.</p>
    </sec>
    <sec id="sec-3">
      <title>Composition of the Characters</title>
      <p>
        The directors have to compose characters with a focus on
representation of their stories. In other words, the users might
recognize the movies’ stories from the composition of
characters, whether it is intuitively or analytically. Hence, the
similarity among movies’ stories is recognizable from the
difference among compositions of characters.
In order to compare the compositions of characters, we
classified the characters with two criteria: (i) importance of their
roles and (ii) proximity with a protagonist. In our former
studies, we proved that roles of the characters are easily
distinguishable. We classified the characters’ roles into three
categories (i.e., main characters, minor characters, and
extras) by using their centralities on the character networks [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
The centralities were estimated by the linear combination of
standard node centrality measurements: the degree centrality,
closeness centrality, and betweenness centrality.
      </p>
      <p>
        In addition, we distinguished the protagonists who are the
most spotlighted and the antagonists who are secondly focused
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. We considered the remaining ones as tritagonists. The
tritagonists were categorized into three sides: friendly, neutral,
and hostile ones for the protagonists.
      </p>
      <p>The categorization was conducted by measuring social ties
among the characters. If there are three characters (cP, cA, and
ci), cP is a protagonist, and cA is an antagonist, we can identify
which character is closer with ci than the other by comparing
ci’s social ties for cP and cA. It can be formulated as
ci 2
8
&gt;&gt;&gt;P;
&gt;
&gt;
&gt;
&lt;</p>
      <p>T (ci; cP) &gt; T (ci; cA);
if T (ci; cP) &gt; median T (c j; cP);</p>
      <p>8c j</p>
      <p>T (ci; cA) &gt; T (ci; cP);
&gt;A; if T (ci; cA) &gt; median T (c j; cA);
&gt;&gt;&gt; 8c j
&gt;
&gt;:N; otherwise:
;
(1)
where T (c j; ck) indicates the degree of social tie between c j
and ck, P and A indicate sets of the characters who are friendly
with the protagonist and the antagonist, respectively, N denotes
a set of the characters who take the neutral positions between
the protagonist and the antagonist, median8c j T (c j; cP) and
median8c j T (c j; cA) refer to median values of the social ties
of all the characters for cP and cA, respectively.</p>
      <p>To measure the degree of social ties, we considered the
frequency of interactions and the number of paths between target
characters. It is formulated as:</p>
      <p>T (ci; c j) =</p>
      <p>jpkj
å Õ f (nl 1; nl );
8pk2Pi; j l=2
(2)
where Pi; j is a set of possible paths between ci and c j, pk
indicates a k-th path in Pi; j, nl denotes a l-th node (character)
on pk, jpkj is the number of nodes included in the pk, and
f (nl 1; nl ) means a weighting value (interaction frequency)
between nl 1 and nl .</p>
      <p>By combining the two classification criteria, we categorized
the characters into six groups, as displayed in Fig. 2.
The number of characters in each category was represented as
a 2 3 matrix. We call this matrix a ‘character composition
matrix’. As a naive approach, the difference between two
movies can be estimated by the Frobenius distance among
their character composition matrices as:</p>
      <p>DC (Ca ; Cb ) =k Ca</p>
      <p>Cb kF ;
(3)</p>
      <p>Importance
t
s
i
n
o
g
a
t
o
r
P
t
s
i
n
o
g
a
t
n
A</p>
      <p>Main</p>
      <sec id="sec-3-1">
        <title>Friendly Main Characters</title>
      </sec>
      <sec id="sec-3-2">
        <title>Hostile</title>
        <p>Main
Characters
y
t
i
m
i
x
o
r
P
la Neutral
tru Main
eN Characters
Minor</p>
      </sec>
      <sec id="sec-3-3">
        <title>Friendly Minor Characters</title>
      </sec>
      <sec id="sec-3-4">
        <title>Neutral Minor Characters</title>
      </sec>
      <sec id="sec-3-5">
        <title>Hostile</title>
        <p>Minor
Characters
where k kF denotes the Frobenius norm. In here, DC (Ca ; Cb )
has smaller value, as Ca and Cb have more similar number of
characters for all the categories. It means that DC is highly
affected by the number of characters.</p>
        <p>Therefore, we normalized the character composition matrix
as:</p>
        <p>CaNorm =</p>
        <p>Ca ;
jCa j
where jCa j is the number of characters that appeared in Ca .
By comparing the normalized composition matrices, we got
a scale-tolerant distance. Also, the number of characters was
directly compared with other movie’s. It can be formulated as:
DCNorm(Ca ; Cb ) =k CaNorm</p>
        <p>CbNorm kF ;
DCScale(Ca ; Cb ) =
max(jCa j; jCb j)
min(jCa j; jCb j)
:
Thus, the explanations to the users can be composed by
considering the (i) proximity and (ii) importance. For example,
‘The Day After Tomorrow (2004)’ and ‘Gravity (2013)’ are
commonly disaster movies. In these two movies, nature takes
the roles of antagonists. However, tritagonists within ‘The Day
After Tomorrow (2004)’ are mostly main characters, while a
protagonist of ‘Gravity (2013)’ almost solely appeared. Also,
the number of character of ‘The Day After Tomorrow (2004)’
is bigger than that of ‘Gravity (2013)’. because ‘The Day
After Tomorrow (2004)’ is closer to a drama movie dealing
with family affection and ‘Gravity (2013)’ is also a sort of
thriller movie.
(4)
(5)
(6)</p>
        <p>Interactions among the Characters
Although the protagonists and antagonists interact with most
of the characters, the others are mostly bounded in particular
communities. For example, acquaintances of the protagonists
usually interact and appear with the protagonists. If they start
appearing frequently with the antagonists, there is a possible
indication that a conflict or a crisis (e.g., betrayal, convert,
kidnapping, etc.) is likely to happen. In other words, interactions
among the characters’ groups reflect methods for developing
the stories.</p>
        <p>Based on this intuition, we compared the stories by using two
criteria: (i) frequency: proportion of inter-group interactions
and (ii) aggressiveness: external adjacency of the groups.
To utilize the two metrics, we had to discover the characters’
groups. The groups were built on the basis of P, A, and N that
were composed in Sect. 2.2. Procedures for discovering the
groups can be summarized as follows.
1. Subsume extras under P, A, and N with the same method
that is used for the main and minor characters.
2. Calculate the internal compactness and the external
adjacency of each group.
3. If the external adjacency of a particular group is too high,
compared with its internal compactness, partition the group.
4. Iterate Step. 2 and Step. 3, until the groups have adequate
quality.</p>
        <p>The external adjacency is measured on the basis of the external
interactions’ frequency and the out-degree of the groups, as:
I (Gk) =</p>
        <p>å
8ci2Gk;
8c j2Gk;
ci,c j
ai; j
å d( j; l)
cl2Gk
jGkj
;
d( j; l) =
1;
0;
if a j;l , 0;
otherwise.</p>
        <p>;
where Gk denotes a k-th group, jGkj indicates the number of
characters included in Gk, and d( j; l) refers to an indicator
function that indicates existence of interactions between c j and
cl . On the contrary, the internal compactness was estimated
from the internal interactions’ frequency and the in-degree of
the groups.</p>
        <p>E (Gk) =</p>
        <p>å
8ci2Gk;
8c j&lt;Gk
ai; j
å d( j; l)
cl&lt;Gk
å d( j; l)
8cl
:
In order to compare I (Gk) with E (Gk), we defined a unified
metric by the linear combination of I (Gk) and E (Gk). It can
be formulated as:</p>
        <p>Q(Gk) = a
(I (Gk)) (1
a) (E (Gk)) ;
where a 2 [0; 1] is a weighting factor; in this study, we set
a = 0:80. Based on Q(Gk), we determined whether Gk had
to be reconstructed or not.</p>
        <p>Therefore, if Q(Gk) was lower than a user-defined minimum
threshold, q , we partitioned Gk into several groups. The
method for partitioning the group is the same with Eq. 1.
It can be summarized as follows:
1. Choose top-two characters as centers for new groups based
on the characters’ centralities.
2. Partition the target group into three new groups by using
the same method with Eq. 1.
3. Calculate Q(Gk) of novel groups. If the average quality of
the novel model is worse than previous model’s, restore the
model and designate next order as the center.</p>
        <p>From the characters’ groups, we measure the proposed
metrics. The frequency can be measured by the proportion of
inter-group interactions for the total interactions. It can be
formulated as:</p>
        <p>F (Ca ) =
å å
8Gk 8ci2Gk;
8c j&lt;Gk
ai; j
ai; j
;
(11)
(12)
where Gk indicates a group of the characters.</p>
        <p>In order to formulate the aggressiveness, we considered both
of the external adjacency and the internal compactness of the
groups. It is formulated as:</p>
        <p>A (Ca ) = å E (Gk)
8Gk
å I (Gk):
8Gk
Therefore, the explanations to the users can be generated by
considering the (i) frequency and (ii) aggressiveness. With the
same example in Sect. 2.2, stories of ‘The Day After
Tomorrow (2004)’ and ‘Gravity (2013)’ are commonly led by two
groups. However, groups in ‘The Day After Tomorrow (2004)’
are highly separated, while groups of ‘Gravity (2013)’
frequently interact with each other, relatively. In addition, groups
within ‘The Day After Tomorrow (2004)’ commonly have high
internal compactness and low external adjacency, since this
movie describes an ‘adventure’, where a father rescues his son.
On the other hand, groups of ‘Gravity (2013)’ show opposite
characteristics, because ‘Gravity (2013)’ draws a person who
has to save herself, relying on tritagonists’ advice.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>CONCLUSION</title>
      <p>We propose a method for explaining the reason behind
existence or non-existence of similarity between the two arbitrary
movies. This method can improve explainability of the
recommender systems for movies. Also, it is expandable to other
media and formats, since the proposed method is built on the
character network.</p>
      <p>Nevertheless, the proposed method is not yet verified with a
large scale dataset. Our future research will be focused on
implementing appropriate datasets and conducting an evaluation
for the proposed method. Also, we are planning to consider
multiple features of the movies (e.g., genre and tempo) for
generating explanation.</p>
    </sec>
    <sec id="sec-5">
      <title>ACKNOWLEDGMENTS</title>
      <p>This work was supported by the National Research Foundation
of Korea (NRF) grant funded by the Korea government (MSIP)
(NRF-2017R1A41015675).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Tran</given-names>
            <surname>Quang</surname>
          </string-name>
          <string-name>
            <surname>Dieu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Dosam</given-names>
            <surname>Hwang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O-Joun</given-names>
            <surname>Lee</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Jason J.</given-names>
            <surname>Jung</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>A Novel Method for Extracting Dynamic Character Network from Movie</article-title>
          .
          <source>In Proceedings of the 7th EAI International Conference on Big Data Technologies and Applications</source>
          . Seoul, Korea.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Xiangnan</surname>
            <given-names>He</given-names>
          </string-name>
          , Tao Chen,
          <string-name>
            <surname>Min-Yen Kan</surname>
            , and
            <given-names>Xiao</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>TriRank: Review-aware Explainable Recommendation by Modeling Aspects</article-title>
          .
          <source>In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (CIKM</source>
          <year>2015</year>
          ), James Bailey, Alistair Moffat, Charu C. Aggarwal, Maarten de Rijke, Ravi Kumar, Vanessa Murdock,
          <string-name>
            <surname>Timos K. Sellis</surname>
          </string-name>
          , and Jeffrey Xu Yu (Eds.). ACM,
          <volume>1661</volume>
          -
          <fpage>1670</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Jai</surname>
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Jung</surname>
          </string-name>
          ,
          <string-name>
            <surname>O-Joun</surname>
            <given-names>Lee</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eun-Soon You</surname>
          </string-name>
          , and
          <string-name>
            <surname>Myoung-Hee Nam</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>A computational model of transmedia ecosystem for story-based contents</article-title>
          .
          <source>Multimedia Tools and Applications</source>
          <volume>76</volume>
          ,
          <issue>8</issue>
          (
          <year>2017</year>
          ),
          <fpage>10371</fpage>
          -
          <lpage>10388</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>O</given-names>
            <surname>-Joun</surname>
          </string-name>
          <string-name>
            <surname>Lee</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jason J.</given-names>
            <surname>Jung</surname>
          </string-name>
          . To Appear.
          <article-title>Modeling Affective Character Network for Story Analytics</article-title>
          .
          <source>Future Generation Computer Systems</source>
          (To Appear).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Quang</given-names>
            <surname>Dieu</surname>
          </string-name>
          <string-name>
            <surname>Tran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Dosam</given-names>
            <surname>Hwang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O-Joun</given-names>
            <surname>Lee</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Jai E.</given-names>
            <surname>Jung</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Exploiting Character Networks for Movie Summarization</article-title>
          .
          <source>Multimedia Tools and Applications</source>
          <volume>76</volume>
          ,
          <issue>8</issue>
          (Apr
          <year>2017</year>
          ),
          <fpage>10357</fpage>
          -
          <lpage>10369</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Quang</given-names>
            <surname>Dieu</surname>
          </string-name>
          Tran and
          <string-name>
            <given-names>Jai E.</given-names>
            <surname>Jung</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>CoCharNet: Extracting Social Networks using Character Co-occurrence in Movies</article-title>
          .
          <source>Journal of Universal Computer Science</source>
          <volume>21</volume>
          ,
          <issue>6</issue>
          (
          <year>2015</year>
          ),
          <fpage>796</fpage>
          -
          <lpage>815</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Yongfeng</given-names>
            <surname>Zhang</surname>
          </string-name>
          , Guokun Lai, Min Zhang, Yi Zhang, Yiqun Liu, and Shaoping Ma.
          <year>2014</year>
          .
          <article-title>Explicit factor models for explainable recommendation based on phrase-level sentiment analysis</article-title>
          .
          <source>In Proceedings of The 37th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR</source>
          <year>2014</year>
          ), Shlomo Geva, Andrew Trotman,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Bruza</surname>
          </string-name>
          ,
          <string-name>
            <surname>Charles L. A. Clarke</surname>
          </string-name>
          , and Kalervo Järvelin (Eds.). ACM,
          <string-name>
            <surname>Gold</surname>
            <given-names>Coast</given-names>
          </string-name>
          ,
          <string-name>
            <surname>QLD</surname>
          </string-name>
          , Australia,
          <fpage>83</fpage>
          -
          <lpage>92</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>