=Paper= {{Paper |id=Vol-1594/paper17 |storemode=property |title=Actor Identification and Relevance Filtering in Movie Reviews |pdfUrl=https://ceur-ws.org/Vol-1594/paper17.pdf |volume=Vol-1594 |authors=Julia Romberg |dblpUrl=https://dblp.org/rec/conf/gvd/Romberg16 }} ==Actor Identification and Relevance Filtering in Movie Reviews== https://ceur-ws.org/Vol-1594/paper17.pdf
       Actor Identification and Relevance Filtering in Movie
                              Reviews

                                                           Julia Romberg
                                               Heinrich-Heine-Universität Düsseldorf
                                                       Institut für Informatik
                                                        Universitätsstraße 1
                                                  40225 Düsseldorf, Deutschland
                                            Julia.Romberg@uni-duesseldorf.de

ABSTRACT                                                               1.   INTRODUCTION
With a large amount of data it is not always useful to run                The internet is a highly frequented medium for the de-
analyses on the entire corpus. Sometimes, it is helpful to             scription and rating of a wide variety of matters. Products,
previously preprocess data by filtering relevant information           such as furniture, books, and movies, can be reviewed by
in order to form a fitting basis for the examination of par-           users to reflect their experiences and to help other users to
ticular aspects such as sentiment analysis. As a result, the           decide whether they should buy a product or not. In ad-
amount of data that needs to be explored is reduced and                dition, review platforms provide the possibility of reading
concentrated, and thus the performance is enhanced. For                different opinions and comparing them with the own one.
example, a correct recognition of the rating of acting perfor-         Especially in the field of entertainment, there is a a need of
mances in movie reviews assumes that only judgements on                exchanging opinions and notes after watching a movie or a
the movie’s actors are used as a basis.                                series, reading a book or listening to a music album and the
In this paper, we discuss different approaches for a rule-             resulting reviews offer great potential for analyses.
based selection of sentences from movie reviews. Our aim is            In this work we focus on movie reviews. Movie reviews
the filtering of sentences in order to facilitate analyses about       mostly consist of a summary of the movie plot and an over-
single actors. Thereby actor identification is used to prese-          all assessment of the movie. In addition, many reviewers ex-
lect a set of sentences that mention a specific actor. This is         pand on the cast and, hence, on the acting performance the
done individually for every actor involved in the movie. Fur-          actors and actresses are showing in this particular movie.
thermore, filtering is used to identify sentences that not only        On the basis of reviews, various aspects can be analysed.
mention an actor but also state facts about him. To evalu-             One way to support a user’s decision for or against watch-
ate the developed methods, a test corpus consisting of ten             ing a movie could be an automatic text analysis that gives
movies with 30 reviews each, taken from the online movie               a general overview of the popularity of the movie instead of
platform IMDb, was built. Based on this data and the pre-              letting him read several reviews.
sented feature selection rules, an average F1 score of 77.9%           At least of equal interest is the individual performance of
is achieved as best result.                                            the participating actors. By examining the reviewers’ im-
                                                                       pressions, the quality of acting can be investigated and, as
                                                                       a result, recommendations can be given matching the user’s
Categories and Subject Descriptors                                     preferences. Furthermore general advices can be given such
I.2.7 [ARTIFICIAL INTELLIGENCE]: Natural Lan-                          as film rankings for single actors and overall rankings for
guage Processing—language parsing and understanding, text              outstanding performances in movies. In order to facilitate
analysis; D.2.8 [DOCUMENT AND TEXT PRO-                                these examinations and to increase the efficiency, it is nec-
CESSING]: Document Capture—document analysis; H.2.8                    essary to extract sentences that are of interest for the later
[DATABASE MANAGEMENT]: Database Applica-                               analysis.
tions—data mining                                                      Mainly, we want to identify actors in movie reviews. This is
                                                                       the basis for analysis on the actor’s performances and act-
                                                                       ing skills. Therefore we use name recognition and corefer-
Keywords                                                               ence resolution approaches. To select informative sentences,
Natural Language Processing, Text Mining, Text Analysis,               filtering techniques are then applied.
Coreference Resolution, Movie Reviews, IMDb
                                                                       2.   RELATED WORK
                                                                          In the field of movie review mining different aspects have
                                                                       been examined. Especially opinion mining and sentiment
                                                                       analysis are important issues. Overall sentiment classifica-
                                                                       tion of movie review documents into negative and positive
                                                                       opinions [10, 12, 5] or into deeper levels of granularity [6,
                                                                       13] has been investigated intensely.
28th GI-Workshop on Foundations of Databases (Grundlagen von Daten-
banken), 24.05.2016 - 27.05.2016, Nörten-Hardenberg, Germany.          Closely related to opinion mining is informative review sum-
Copyright is held by the author/owner(s).                              marisation. This has been analysed for product and cus-




                                                                  92
         Actor Identification and Relevance Filtering in Movie Reviews



tomer reviews in general [1, 3] and for movie reviews [9, 14]         Definition 2. Let si ∈ S be any sentence of a movie
in particular. In [3] product features in customer reviews         review and let aj be an actor. Then si ∈ Saj if and only if
are mined, positive and negative phrases are identified and        si contains information about the actor himself.
then the discovered information is used to summarize the re-
views. Thereby product features are product characteristics        Thereby, the knowledge about the played role is excluded
such as picture quality for a digital camera. As the struc-        since this is assumed as previously known and gives no fur-
ture of movie reviews differs from that one of product and         ther description about the actor and his play. In addition,
customer reviews, extra research was done. In [9] overall sen-     only cases in which it is clear that the actor is meant are
timent analysis on movie reviews is done but on subjective         regarded.
parts of the documents. To extract those parts, techniques         To clarify the difference between relevant and irrelevant in-
for finding minimum cuts in graphs are used. Subsequently,         formation concerning the played role consider the following
reviews are summarized for a cleaner representation of the         sentences:
polarity. These concepts are not applicable for our purpose           • Leonardo DiCaprio plays Hugh Glass.
as they focus on overall sentiment analysis.
Concerning the approaches evaluated in our paper, [14] is of          • Leonardo DiCaprio plays Hugh Glass very well.
interest: In there, movie reviews are mined in order to de-
                                                                   The first sentence only states the relation of DiCaprio and
termine whether opinions are negative or positive and sum-
                                                                   his role in the movie ”The Revenant” without any further
marized. Thereby features on which the reviewers express
                                                                   information regarding the actor, whereas the latter one ad-
their opinions are extracted. Features are split in different
                                                                   ditionally describes the quality of DiCaprio’s play. There-
classes such as ”screenplay” and ”music and sound effects”.
                                                                   fore, only ”Leonardo DiCaprio plays Hugh Glass very well.”
The class ”actor and actress” is treated explicitly. To ease
                                                                   is relevant for us.
the actor identification, movie cast lists are used. Then first
                                                                   Equally, sentences like ”Hugh Glass is played by Leonardo
name only, last name only, full name and abbreviations are
                                                                   DiCaprio” and ”Hugh Glass (Leonardo DiCaprio) is out for
used to identify a feature. The obtained features are used to
                                                                   vengeance.” are not relevant for Leonardo DiCaprio accord-
mine feature-opinion pairs on which the sentiment analysis
                                                                   ing to Definition 2.
is based. In contrast to our work, the focus in [14] is not on
actors in particular. Instead, a number of classes of features     3.2     Approaches
is used to examine opinions for which reason more general
                                                                      Several approaches for the extraction of relevant sentences
approaches were chosen. Actor identification through name
                                                                   are developed. The first approach focuses on the explicit use
recognition was used. In [4] the opinion mining approach
                                                                   of names, in the next one coreferences are considered and the
presented in [14] is extended with anaphora resolution. How-
                                                                   last approach is about filtering irrelevant sentences.
ever, name recognition by including spelling mistakes is not
further examined. We also add coreference resolution to im-        3.2.1    Names
prove the identification, but instead of searching directly for
                                                                      The most naive approach for finding an actor in a sentence
opinion target-opinion words pairs, we take reference to a
                                                                   is the search for his name. The full name of an actor can
list of words that may indicate irrelevant identifications for
                                                                   be considered as well as only parts of the actor’s name. In
eventual sentiment analysis.
                                                                   general, a person’s full name is used in the beginning of a
                                                                   text passage for the purpose of introduction. Then, the first
3.    ACTOR IDENTIFICATION ON SEN-                                 or the last name can serve as representatives.
      TENCE LEVEL                                                  In many cases movie reviews contain spelling mistakes. To
                                                                   take that into account, the Levenshtein distance [7] is used
   Aiming for an extraction of sentences that refer to spe-
                                                                   for allowing deviations: The Levenshtein distance of two
cific actors, we first have to define what we consider as an
                                                                   words is the minimum number of edit operations that are
extraction. Under the assumption that we have at least one
                                                                   necessary to convert one word into the other. Permitted edit
review, a set of sentences exists. Not every sentence contains
                                                                   operations are the insertion of characters, the replacement
information related to an actor. Thus, a filtering of relevant
                                                                   of characters and the deletion of characters.
sentences is needed. As the focus is on individual actors, the
selection also has to be done distinctly:                          3.2.2    Coreferences
   Definition 1. Let S be the set of all sentences of a movie        On the one hand, names and parts of names are impor-
review. For any actor aj exists a subset Saj ⊆ S of relevant       tant clues for the reference to an actor. On the other hand,
sentences.                                                         personal pronouns are used as well to substitute full names.
                                                                   According to A. Radford [11], two expressions are co-
Below, after the definition of relevance, approaches for actor     referential if they refer to the same entity. In the present
identification and filtering of irrelevant sentences are pre-      case, entities are actors and all the expressions building a
sented.                                                            coreference towards a specific actor within a movie review
                                                                   are to be found.
3.1    Definition of relevant sentences                            For example, given the following sentences: ”DiCaprio
  First, the relevance of a sentence according to a certain        plays Jack in James Cameron’s Titanic.” and ”This year
actor has to be clarified. Not every mention of an actor really    he finally won the Academy Award for Best Actor.”. Both
refers to him. For the purpose of this study, only sentences       sentences are about Leonardo DiCaprio as referred entity
that not only take reference to but also contain information       and the coreferential expressions are ”DiCaprio” in the first
about an actor are interesting.                                    one and ”he” in the second one.
                                                                   Through intense examination of the test corpus (see section




                                                              93
          Actor Identification and Relevance Filtering in Movie Reviews



4.1), ”he” and ”she” were identified as the most frequently       with the Stanford CoreNLP[8]. Stanford CoreNLP was
used single personal pronouns. For this reason, we focus          chosen for several reasons. On the one hand, basis features,
on ”he” as male coreferential expression and ”she” as female      such as tokenization and part of speech-tagging, seem to
coreferential expression.                                         perform well. On the other hand, they provide a coreference
Please note that also descriptive nouns like ”the actor ” pro-    resolution system we work with for comparison (see section
vide an additional way of referring to an entity. They are not    4.2). Subsequently, based on the cast list, each sentence
explicitly examined in this paper as there are many different     is manually annotated with actor names. At times, poor
ways of describing someone. Additionally, they depend on          text quality makes it difficult to understand references.
the appearance and characteristics of the referenced person.      Furthermore, the distinction between roles and actors is not
This explicit knowledge is not given through the resources        consistent in some reviews. Instead of using the role name
we use here. The special case of ”the actor ” is supposed to      when talking about a character, the actor’s name is used.
be used independently for every actor and thus more fre-          Biopics are hard to handle as well. For example the movie
quently in general but after examination of the test corpus       ”Walk the Line” is about the life of Johnny Cash but there
(see section 4.1) this is not the case. However, a complete       is a distinction between the role Johnny Cash and the real
coreference resolution system is run for comparison with the      Cash in the reviews.
here developed techniques in section 4.2.                         The evaluation is based on each actor of the ten movies in
                                                                  the test corpus to which, at least one time, reference has
3.2.3     Filtering                                               been made by some reviewer. For every movie’s actor, recall
   To take into account the relevance of information as men-      and precision are built. The F1 -measure was chosen to
tioned previously, the feature set for an actor obtained by       include both measures, recall and precision, for comparison
the name and the coreference approach has to be filtered.         of the approaches. Then, a F1 score for each movie is
The filtering is motivated by the three kinds of irrelevant       calculated by averaging the F1 scores of the involved actors.
sentences presented in the section 3.1.                           In order to compare approaches, an average F1 score is
A sentence like ”Leonardo DiCaprio plays Hugh Glass.” gives       built over all movies.
no information about the actor and hence is irrelevant. How-
ever, sentences of this kind are included in the feature set
at this point because of the explicit mention of the actor’s      4.2     Evaluation
name. To correct this, the actor’s name followed by ”plays”          The presented approaches are evaluated sequentially. Ac-
is not treated as a mark for relevance.                           tor identification by the full name serves as a baseline as this
The phrase ”played by” as in ”Hugh Glass is played by             is the most intuitive way of finding sentences that could be
Leonardo DiCaprio” is also an indicator for irrelevance and       relevant. The average F1 score for this baseline is 61.7%.
is similarly taken out of the sentence set of the involved ac-
tor.                                                              4.2.1     Actor identification through First and Last
The last case is a note about the actor in brackets. An                     Names
example is ”Hugh Glass (Leonardo DiCaprio) is out for                Firstly, we evaluate if the use of parts of the actor’s name
vengeance.”. To solve this, actor names in brackets are fil-      can improve the baseline. Therefore, it is checked if the test
tered out.                                                        sentences contain the first or the last name. Due to spelling
It should be noted that these filtering rules do not exclude      mistakes, we experiment with a constant Levenshtein
every sentence that includes one of the cases from the set of     distance as a threshold and with a threshold that is relative
relevant sentences of a specific actor. ”Leonardo DiCaprio        to the word length. For a constant threshold, the values
plays Hugh Glass and DiCaprio is great.” is still correctly       0, 1, 2 and 3 are tested and for a threshold in respect of the
detected as relevant for DiCaprio.                                worth length, the values 11 , 12 , 13 , 14 , 16 , 81 and 16
                                                                                                                            1
                                                                                                                              are tested.
                                                                  The inclusion of selection through first names yields an
4.     APPLICATION                                                average score of 64.7%. The application of the Levenshtein
  For the evaluation, the approaches explained in section         distance does not lead to better results. For short words
3.2 are now implemented. First, an overview of the used           already one edit operation can change their meaning. First
tools and the database on which we evaluate is given. Then        names tend to be short and, therefore, respond strongly to
the results are discussed.                                        allowed deviations. For instance the name ”Ron” transforms
                                                                  into ”on” with just one delete operation.
4.1     Database and pipeline                                     By contrast the reached score regarding last names can
   Internet platform IMDb1 is chosen as a freely available        be maximized to 73.5% by using a Levenshtein distance
database. Ten films with 30 reviews each from different           of 31 of the worth length. This behavior is as expected:
genres are selected arbitrarily to build a test corpus for the    Because of the fact, that last names are usually longer, the
evaluation of the approaches described above. The selected        idea of detecting spelling mistakes like ”Gossling” (correctly
films are ”Blue Valentine”, ”Cruel Intentions”, ”Fast &           ”Gosling”) with the Levenshtein distance works.
Furious 7 ”, ”Philadelphia”, ”Pretty Woman”, ”Sex and the         After recognizing that both approaches individually en-
City”, ”The Lord of the Rings: The Fellowship of the Ring”        hance the results, we take a closer look on combinations
and ”Walk the Line”. To provide a good text quality, the          of them. Only the search for the first and the last name
first 30 reviews according to the IMDb filter ”Best” are          without consideration of spelling mistakes leads to a barely
extracted. For every movie the cast list and for every            noticeable improvement (73.6%). Overall, by combining
actor the gender are crawled. Every review is processed           first and last name approaches after running the baseline,
                                                                  the recall increases but the precision worsens.
1
    http://www.imdb.com                                           By comparison, last names are used more frequently to




                                                             94
         Actor Identification and Relevance Filtering in Movie Reviews



                                                                    occurrence of ”he” or ”she” depending on the actor’s gender
Table 1: Comparison of the different coreference ap-                is the critical factor for a relevant coreference. Since Stan-
proaches.                                                           ford’s coreference solution in general reveals coreferences of
                 (0)    (1)     (2)     (3)   (4)                   various types, in (4) every chain to the next sentence is com-
  b+first&last       0.736   0.735    0.733   0.736   0.659         prehended as relevant and as an indicator for an assignment.
   b+last( 13 )      0.735   0.739    0.734   0.736   0.658         Although an improvement is expected, (3) does not signifi-
                                                                    cantly change the F1 score. By incorporating more aspects
                                                                    of coreference, the results are even worse.
                                                                    For completeness, each of the name recognition approaches
talk about actors in the test corpus whereas that does              mentioned in section 4.2.1 has been combined with (1), (2),
not hold true for roles. The former may originate in the            (3), and (4). The two that perform best were listed in Table
missing personal connection towards the actor. However,             1.
as characters in movies are merely speaking to each other
using full names, in the test corpus, first names are more          4.2.3    Improvement through Filtering
frequently used for descriptions of roles.                             Finally, the approaches for filtering are evaluated.
                                                                    The phrases ”actor/coreference plays”, ”played by ac-
These results are achieved with lowercased words. Case              tor/coreference” and ”(actor )” are used for this purpose.
sensitiveness is also investigated with the same parameters.        They are tested with all possible combinations for names
In the test corpus it is observed that reviewers do not             and pronoun resolution as described in section 4.2.1 and
always use capitalization correctly. Some tend to use no            4.2.2. Using ”played by actor/coreference” and ”(actor )” as
capitals at all while others vary or write names completely         filter after the baseline and assignment by the last name un-
in upper case. As expected, the resulting scores are slightly       der Levenshtein distance ( 13 of word length) with subsequent
worse in general and thus we decide to proceed without              use of (1) results in a F1 score of 77.9%.
spending attention to the case of capitalization.                   The individual F1 scores according to this approach are
                                                                    shown in Figure 1. Each bar represents one of the ten movies
4.2.2    Actor identification through Coreferences                  that are considered in the test corpus. The y-axis shows the
   After name based strategies, approaches that use coref-          reached F1 scores. Depending on the movie, the F1 scores
erences are evaluated. Based on the name recognition, the           vary widely. For ”Pretty Woman” only a score of 55.8% is
personal pronouns ”he” and ”she” are assigned. For every            reached and also the score of 65.7% for ”Sex and the City”
actor and already selected sentence, the next sentence is           is comparatively poor. In contrast, for ”Sicario” a score
observed. Because of the known gender of each actor, a dis-         of 89.8% is achieved. Closer inspection of the test corpus
tinction in female and male is possible. If the actor is a          reveals that role and actor names are mixed more in the in-
woman, the preceding sentence will be searched for ”she”.           cluded reviews for ”Pretty Woman” and ”Sex and the City”
Equally, for a male actor the word ”he” is required.                than in the other movies’ 30 reviews.
As one single personal pronoun only refers to one person,
an assignment is only done in a clear case. This means that         5.   CONCLUSIONS AND FUTURE WORK
only one actor of the same gender is found in a sentence and           In this work different approaches for the assignment of
the next sentence contains the fitting personal pronoun. For        actors to sentences in movie reviews were discussed. An av-
comparison we also analyse an assignment of all male/female         erage F1 score of 77.9% was achieved by the presented ap-
actors of a sentence. Table 1 shows the results. The first          proaches. Thereby the performance varies strongly depend-
row displays the F1 scores that are reached by combining            ing on the movie. For most of the tested movies, good scores
name recognition through first and last name without al-            between 75% and 90% could be reached. For one movie, only
lowed deviation (b+first&last) with the below explained             a F1 score slightly above 55% was reached, which must be
coreference approaches. The second row refers to name               rated as rather poor.
recognition by only the last name with threshold 31 of the          Names offer good reference points for the treatment of an
length (b+last( 31 )) as starting point for the coreference res-    actor in a sentence. In order to exclude irrelevant sentences,
olution. The columns show the coreference techniques that           in effect sentences that only mention the actor without giv-
are combined with b+first&last and b+last( 13 ). The first          ing any information, filtering techniques are useful. Espe-
column (0) shows the score only achieved by b+first&last            cially the elimination of phrases like ”role(actor)” enhanced
and b+last( 13 ). (1) stands for the first coreference resolu-      the performance. To improve the results, further filtering
tion approach, that only assigns ”he” and ”she” in a clear case     approaches are to be developed. Likewise the two other fil-
(as described above), whilst (2) represents the approach, in        tering approaches presented offer a potential for further re-
which ”he” and ”she” are also assigned in ambiguous cases.          search on descriptive sentence structures used with ”plays”
The more restrictive version achieves better results whereas        and ”played by”. Coreference resolutions do not lead to a
the other version even leads to decreases in the performance.       significant improvement in our experiments. In fact, some
However the improvement in regard to the names-only ap-             of them even lead to a decrease of the F1 score. A closer ex-
proach is poor. The best F1 score 73.9% is reached by               amination of co-referential expressions for actors is planned.
b+last( 13 ) and the restrictive pronoun resolution (1).            The recognition of persons by paraphrases has not further
Stanford’s CoreNLP includes a coreference resolution sys-           been discussed in the course of this paper. Nevertheless this
tem. Taking this into account, two further techniques are           aspect should not be omitted and needs intense research.
developed and the results can be seen in Table 1 as well.           Other rule-based techniques that might achieve better re-
Similar to (1), in (3) for every assigned actor it is checked       sults can be developed. Besides different strategies such as
if a coreference chain to the next sentence exists. If so the       machine learning concepts may be useful for the invented




                                                               95
                Actor Identification and Relevance Filtering in Movie Reviews



                 0.9

                0.85

                 0.8

                0.75
     F1 score




                 0.7

                0.65

                 0.6

                0.55

                                                          Blue Valentine
                                                         Cruel Intentions
                                                          Fast Furious 7
                                                             Philadelphia
                                                          Pretty Woman
                                                        Sex and the City
                                                                   Sicario
                                                                Superbad
                            Lord of the Rings: The Fellowship of the Ring
                                                           Walk the Line

Figure 1: F1 scores for the movies and reviews that are contained in the test corpus when using ”played by
actor/coreference” and ”(actor)” as filter after the baseline and assignment by the last name under Levenshtein
distance ( 31 of word length) with subsequent use of (1).



problem. In general, exclamations like ”Great performance! ”                  international conference on Knowledge discovery and
that state facts about actors and their play seem to be hard                  data mining, pages 168–177. ACM, 2004.
to solve. These sentences are not correctly assigned with our             [4] N. Jakob and I. Gurevych. Using Anaphora
approaches as neither a name is mentioned nor an explicit                     Resolution to Improve Opinion Target Identification in
coreference is used. Likewise the mix-up of role and actor                    Movie Reviews. In Proceedings of the ACL 2010
names can not be handled without contextual knowledge.                        Conference Short Papers, pages 263–268. Association
Based on the assignment of sentences, the reviewer’s view to-                 for Computational Linguistics, 2010.
wards an actor can be analysed. In future work, we want to                [5] A. Kennedy and D. Inkpen. Sentiment Classification
examine the polarity of things being said about an actor in                   of Movie Reviews using Contextual Valence Shifters.
movie reviews. Therefore, a classification in {positive, neu-                 Computational intelligence, 22(2):110–125, 2006.
tral, negative} is done as an initial approach. A code book               [6] A. Koumpouri, I. Mporas, and V. Megalooikonomou.
will be used for a manual classification of the test corpus.                  Evaluation of Four Approaches for ”Sentiment
As an instrument, SentiWordNet [2] or Stanford’s sentiment                    Analysis on Movie Reviews”: The Kaggle
analysis tool could be applied.                                               Competition. In Proceedings of the 16th International
                                                                              Conference on Engineering Applications of Neural
6.   REFERENCES                                                               Networks (INNS), EANN ’15, pages 23:1–23:5. ACM,
 [1] M. Abulaish, Jahiruddin, M. N. Doja, and T. Ahmad.                       2015.
     Feature and Opinion Mining for Customer Review                       [7] V. I. Levenshtein. Binary Codes Capable of Correcting
     Summarization. In Proceedings of the 3rd                                 Deletions, Insertions, and Reversals. Elsevier Science
     International Conference on Pattern Recognition and                      & Technology, 1965.
     Machine Intelligence, 2009.                                          [8] C. D. Manning, M. Surdeanu, J. Bauer, J. Finkel,
 [2] S. Baccianella, A. Esuli, and F. Sebastiani.                             S. J. Bethard, and D. McClosky. The Stanford
     SentiWordNet 3.0: An Enhanced Lexical Resource for                       CoreNLP Natural Language Processing Toolkit. In
     Sentiment Analysis and Opinion Mining. In N. C. C.                       Association for Computational Linguistics (ACL)
     Chair), K. Choukri, B. Maegaard, J. Mariani,                             System Demonstrations, pages 55–60, 2014.
     J. Odijk, S. Piperidis, M. Rosner, and D. Tapias,                    [9] B. Pang and L. Lee. A Sentimental Education:
     editors, Proceedings of the Seventh International                        Sentiment Analysis Using Subjectivity Summarization
     Conference on Language Resources and Evaluation                          Based on Minimum Cuts. In Proceedings of the 42Nd
     (LREC’10). European Language Resources                                   Annual Meeting on Association for Computational
     Association (ELRA), may 2010.                                            Linguistics, ACL ’04. Association for Computational
 [3] M. Hu and B. Liu. Mining and Summarizing Customer                        Linguistics, 2004.
     Reviews. In Proceedings of the tenth ACM SIGKDD                     [10] B. Pang, L. Lee, and S. Vaithyanathan. Thumbs Up?:




                                                                   96
        Actor Identification and Relevance Filtering in Movie Reviews



     Sentiment Classification Using Machine Learning
     Techniques. In Proceedings of the ACL-02 Conference
     on Empirical Methods in Natural Language Processing
     - Volume 10, EMNLP ’02, pages 79–86. Association
     for Computational Linguistics, 2002.
[11] A. Radford. English Syntax: An Introduction.
     Cambridge, UK: Cambridge University Press, 2004.
[12] K. Tsutsumi, K. Shimada, and T. Endo. Movie
     Review Classification Based on a Multiple Classifier.
     In Proceedings of the annual meetings of the Pacific
     Asia conference on language, information and
     computation (PACLIC), pages 481–488, 2007.
[13] C. Whitelaw, N. Garg, and S. Argamon. Using
     Appraisal Groups for Sentiment Analysis. In
     Proceedings of the 14th ACM International Conference
     on Information and Knowledge Management, CIKM
     ’05, pages 625–631. ACM, 2005.
[14] L. Zhuang, F. Jing, and X.-Y. Zhu. Movie Review
     Mining and Summarization. In Proceedings of the 15th
     ACM international conference on Information and
     knowledge management, pages 43–50. ACM, 2006.




                                                         97