=Paper= {{Paper |id=Vol-1619/paper7 |storemode=property |title=Praiseworthy Acts Recognition Using Web-based Knowledge and Semantic Categories |pdfUrl=https://ceur-ws.org/Vol-1619/paper7.pdf |volume=Vol-1619 |authors=Rafal Rzepka,Kohei Matsumoto,Kenji Araki |dblpUrl=https://dblp.org/rec/conf/ijcai/RzepkaMA16 }} ==Praiseworthy Acts Recognition Using Web-based Knowledge and Semantic Categories== https://ceur-ws.org/Vol-1619/paper7.pdf
                                  Praiseworthy Act Recognition
                        Using Web-based Knowledge and Semantic Categories
                                 Rafal Rzepka, Kohei Matsumoto and Kenji Araki
                                Graduate School of Information Science and Technology
                                             Hokkaido University, Japan
                                     {rzepka,matsumoto,araki}@ist.hokudai.ac.jp

                            Abstract                                  guage abilities4 . Problems related to psychological disor-
                                                                      ders could be alleviated by technological advancements, in-
       In this paper we1 introduce our novel method for               cluding progress in Artificial Intelligence, especially in cases
       utilizing web mining and semantic categories for               of social withdrawal in which depressed adolescents prefer
       determining automatically if a given act is worth              to deal with computers than with people. As psychology
       praising or not. We report how existing lexi-                  studies show [Hofmann et al., 2012], the depression can be
       cons used in affective analysis and ethical judge-             treated by cognitive behavioral therapies (CBT) as efficiently
       ment can be combined for generating useful queries             as medicaments and such treatment is based on conversa-
       for knowledge retrieval from a 5.5 billion word                tion. Although computers are already used as supportive
       blog corpus. We also present how semantic cat-                 tools in CBT [Wright et al., 2005], we are far away from
       egorization helped the proposed method to finally              entrusting patients to autonomous therapists. However, we
       achieve 94% of agreement with human subjects                   believe that various conversational rules utilized in dialog-
       who decided which act, behavior or state should be             based therapies and other positive aspects [Burnard, 2003;
       praised. We also discuss how our preliminary find-             Zimmerman et al., 2009] of a conversation itself can be im-
       ings might lead to developing an important social              plemented in artificial agents like companion robots [Sarma
       skill of a robotic companion or an automatic thera-            et al., 2014]. In this paper we introduce our idea how to uti-
       pist during their daily interaction with children, el-         lize Natural Language Processing techniques, a set of lexi-
       derly or depressed users.                                      cons and semantic categories to web mine knowledge neces-
                                                                      sary for recognizing if an action being a dialog topic should
                                                                      be e.g. complimented by an agent.
1 Introduction                                                        1.1   Importance of Praising
Predictions from world demographic trends show that the cur-          We chose the act of praising to be implemented in our ar-
rent ratio of people aged sixty or more (12.6%) will nearly           tificial agent for a variety of reasons. First of all it is an
double in 2050 (almost 22%)2 . Younger generations would              evaluation task which positively influences a praised per-
need to work more and worry more, not only about their aged           son [Kanouse et al., 1981] and motivates, especially children
parents but also about their children to whom they would              [Henderlong and Lepper, 2002]. Often seen in interpersonal
dedicate less time. Stress among working age group could              interaction, praising is used to encourage others, to socialize,
be caused not only by work itself but also by the aware-              to integrate groups, and to influence people [Lipnevich and
ness of children and parents often left to their own devices.         Smith, 2008]. It is believed to have beneficial effects on self-
Data gathered by American Depression and Bipolar Sup-                 esteem, motivation and performance [Weiner et al., 1972;
port Alliance3 indicates that depression most often strikes           Bandura, 1977; Koestner et al., 1987]. It is widely ac-
at age 32 in the United States, but poses also an obvious             knowledged that to praise oneself could substantially help
problem among different age groups. One child in 33 chil-             dealing with depression [Swann et al., 1992] and praising
dren and one in eight adolescents have clinical depression            improves behavior [Garland et al., 2008], academic perfor-
and even if as many as six million elderly people are af-             mance [Strain et al., 1983] and work performance [Crowell
fected by mood disorders, but only 10% ever receive treat-            et al., 1988]. But there is some other interesting and difficult
ment. Precise numbers are often difficult to obtain as many           aspect of praising – the praiser has to be competent and share
subjects do not want to participate in studies, do not respond        some relationships with the praised person [Carton, 1996].
to surveys, do not answer the door or have insufficient lan-          Also, from the Artificial Intelligence point of view, the auto-
                                                                      matic distinction between praiseworthy and not praiseworthy
   1
     Second author is currently with Panasonic Co.                      4
                                                                          www.nimh.nih.gov/health/statistics/
   2
     www.unfpa.org/ageing                                             prevalence/major-depression-among-adults.
   3
     www.dbsalliance.org                                              shtml



                                                                 41
Proceedings of the 4th Workshop on Sentiment Analysis where AI meets Psychology (SAAIP 2016), IJCAI 2016, pages 41-47,
                                          New York City, USA, July 10, 2016.
acts is an interesting long-term challenge to create a righteous
and trustful machine and, in this particular case, to investigate
if the Web resources could become a sufficient knowledge
base for such tasks. Our hypothesis is that knowing the po-
larity of consequences of human acts might be the key to an
automatic evaluation of these acts.
1.2     State of the Art
The authors have found only one research proposal dedicated
specifically to automating praising. In 1998 [Tejima et al.,
1998] have published a two page paper in which they de-
scribe their observations from physiotherapists’ sessions with
elderly. The researchers proposed a simple verbal encourage-
ment algorithm for walking training and implemented it later
[Tejima and Bunki, 2001], however the effectiveness could
not be confirmed due to the insufficient number of experi-
mental subjects. Causing positive moods in interlocutors can
be found as a sub-task in Human-Computer Interaction (HCI)
field, especially in learning-oriented agents [Fogg and Nass,
1997; Kaptein et al., 2010] but the studies utilize scenarios
and manually created rules when to praise. Systems that ac-
cept, in theory, any sentence as an input and recognize polar-
ity or emotive categories were proposed in the fields of sen-
timent analysis and affect recognition [Wilson et al., 2005;
Strapparava and Mihalcea, 2008] and the basic idea for our
system is borrowed from their approaches. However these
methods cannot be utilized straightforwardly because being
positive does not have to mean an act is worth praising (“I
saw a movie” is labelled positive by these methods but it usu-           Figure 1: Algorithm for retrieving and analyzing conse-
ally does not mean we need to react with a compliment to                 quences of acts in order to determine if they should be
such a statement). For English language there are promising              praised.
methods for retrieving goodFor and badFor events [Deng and
Wiebe, 2014] and for acquiring knowledge of stereotypically
positive and negative events from personal blogs [Ding and
Riloff, 2016]. Basically any new trend in the field [Cambria et          the verb 15 suffixes representing conditions and temporal se-
al., 2013; Socher et al., 2013] should eventually help improve           quences to retrieve more adequate sentences (waruguchi itta
our results as soon as they are implemented for Japanese lan-            ato “after calling names”, waruguchi iu toki “when called
guage, which often has much less resources to keep up with               names”, waruguchi itte “called names and then”, etc.). Be-
the latest methods. For Japanese [Rzepka and Araki, 2015]                cause particles are often omitted in colloquial Japanese, an-
have proposed a system that evaluates textual inputs from a              other set of 15 phrases without particles is created and the fi-
moral perspective. Similarly to our approach it uses lexicons            nal 30 phrases together with phrases with verbs in their basic
and one of them, based on Kohlberg’s theory of moral stages              (dictionary) form become queries for 5.5 billion word YACIS
development [Kohlberg, 1981], includes praise-punishment                 corpus of Japanese raw blogs [Ptaszynski et al., 2012]. Text
polarized pairs. However, the lexicon contains only 14 praise            retrieved from the corpus is then cleaned – emoticons usu-
related words limited to synonyms of the verb “praise” which,            ally used as sentence boundaries are converted to fullstops
as shown later in the comparison experiment, are insufficient            and too long and too short sentence candidates are deleted.
for our purposes.                                                        In the next step, the generated temporary corpus of sentences
                                                                         containing input acts is normalized to verb dictionary forms
2 System Overview                                                        and divided into meaningful chunks by Argument Structure
The algorithm of our system is presented in Figure 1. In                 Annotator ASA [Takeuchi et al., 2010] to avoid granular di-
the first step an input act (noun - verb pair we treat as the            vision of morphological analyzer. For instance “was | beat
minimal semantic unit describing any act) in Japanese lan-               | ing | brother” becomes “beat brother” and such transitions
guage is morphologically analyzed by MeCab5 to determine                 are made to increase the coverage of matching chunks with
a noun, a verb and the joining particle representing grammati-           phrases from lexicons in the next step. Every match is scored
cal case (e.g. aisatsu-o suru “to greet someone” or yakusoku-            1 and the totals are compared. If there are more than 50% of
o mamoranai “not keeping promises”, where particle “o” in-               positive or negative counts, the act is estimated as praisewor-
dicates an object of given verb). Then the system adds to                thy or not praiseworthy accordingly. Although in morality es-
                                                                         timation task 60/40 ratio scored highest [Rzepka and Araki,
   5
       taku910.github.io/mecab/                                          2015], in our task the 50/50 ratio achieved better results.



                                                                    42
2.1   Lexicons                                                         3.1   Input Acts
                                                                       Web resources used in the study give an opportunity to pro-
As mentioned in the introduction, we hypothesized that mea-            cess any kind of act but this freedom causes difficulties with
suring the polarity of act consequences might be the key               choosing a fair and balanced input. To deal with this problem
for recognizing praiseworthy acts. Although aware of pos-              we created two sets, one generated automatically and evalu-
sible problems mentioned in the Introduction, we decided               ated by subjects, and second one created by the same subjects
to investigate how efficient the existing emotional recogni-           specifically instructed to give examples of praiseworthy and
tion methods could deal with our task. Therefore firstly we            not praiseworthy acts different from these which they labeled.
chose two different freely available lexicons used for lexi-           By introducing these two types we tried to find a balance be-
con based polarity recognition in Japanese language. The               tween “any input” (because the algorithm should recognize
larger one was statistically generated from manually anno-             neutral acts) and more specific, manually crafted set of cor-
tated sentences in the study of [Takamura et al., 2005]. It            rect data.
contains 55,102 words divided into positive (5,121 words)
and negative (49,981 words) ones. Every word was auto-                 Automatically Generated Set
matically scored on the scale from minimal -1 to maximal               For creating the first set we utilized 200 verbs from the Sta-
1 and the words closer to 0 tend to be inaccurately labeled            tistical Lexicon with the highest hit number in the blog cor-
(e.g. okaasan “mom” or narubeku “as possible”, are marked              pus (100 from positive subset and 100 from negative sub-
as negative words), therefore using the whole (significantly           set) and paired them with nouns most frequently co-occurring
unbalanced) set would cause drops in accuracy. In order to             within Japanese Frames dataset automatically generated from
minimize this problem and to make the lexicon more bal-                the biggest Japanese Web corpus [Kawahara and Kurohashi,
anced, after analyzing the entries we used most positive 3,000         2006]. In order to limit the number of acts and to maintain
and most negative 3,000 words (closest to 1 and -1 from each           sufficient coverage (to observe to what extent the automati-
side) and called it “Statistical Lexicon”.                             cally polarized words are efficient), we added two conditions.
   Another lexicon used in polarity detection in Japanese texts        The noun object must be included in the Statistical Lexicon
is created manually by [Nakamura, 1993] from emotive sen-              and the generated act must appear at least ten times in the
tences retrieved from Japanese literature. The words are sep-          blog corpus. Hence, if e.g. verb “keep” from the lexicon was
arated into ten categories (Like, Joy, Relief, Dislike, Anger,         co-occurring frequently with object noun “promise” and the
Fear, Shame, Sadness, Excitement, Surprise) and because Ex-            phrase “to keep a promise” was found more than 10 times in
citement and Surprise have no distinct valence, these two cat-         the blog corpus, the phrase was treated as a common human
egories were excluded. The combined words from Like, Joy               act and became an input. With this method we generated 119
and Relief form a positive subset and Dislike, Anger, Fear,            acts which were then evaluated by three judges (one female
Shame and Sadness form a negative one. Resulting lexicon               in her fifties, one male university student and one female sec-
of 526 positive and 756 negative words (1,282 in total) we             ondary school pupil) by labeling the set as praiseworthy, not
call here “Literature Lexicon” to make it more comprehensi-            praiseworthy or hard to tell. The majority vote (three judges
ble while presenting comparison between lexicons.                      agreed or two agreed and the third answered “hard to tell”) re-
   As mentioned before, a positive act does not necessarily            sulted in 54 acts – 31 worth praising as tomodachi-o iwau (“to
imply being praiseworthy, therefore we decided also to test            congratulate a friend”) or chichi-o shitau (“to admire one’s fa-
a lexicon used for ethical judgement by [Rzepka and Araki,             ther”) and 23 not worth praising as tanin-o nikumu (“to hate
2015]. This relatively small set, containing 65 positive and 69        somebody”) or itami-o shiiru (“to impose pain upon some-
negative words (134 in total), was created by applying phrases         one”). Two examples of acts on which agreement was not
related to the five stages of moral development proposed by            reached are hiza-o kussuru (“to bend one’s knees / to yield to
[Kohlberg, 1981]: obedience / punishment, self-interest, so-           someone”) and yami-o kowagaru (“to be afraid of darkness”).
cial norms, authority / social-order, and social contract. For         The labeled data became both the input and first correct data
example in the obedience / punishment subset there are words           set and we named it “Automatically Generated Set”.
like “punished”, “awarded”, “punishment”, “award” and au-              Manually Created Set
thority / social order contains law-related words like “sen-
                                                                       Because the automatically retrieved input set was biased to-
tenced”, “legal” or “arrested”. To examine how emotional
                                                                       ward Statistical Lexicon we asked the same group of three
and social consequences work together, we created another
                                                                       people to think of acts worth praising and not worth prais-
lexicon, a combination of Kohlberg’s theory-based set with
                                                                       ing. The created set (from now on called “Manually Created
the Nakamura’s literature-based set. We named the former
                                                                       Set”) contained 64 acts – 32 of praiseworthy ones as shiken-
“Ethical Lexicon”, and the latter “Combined Lexicon”.
                                                                       ni goukaku suru (“passing an exam”) or tetsudai-o suru
                                                                       (“helping someone”), and not worth praising as yakusoku-o
                                                                       mamoranai (“not to keep a promise”), kenka-o suru (“to quar-
3 Experiments and Results                                              rel / to have a fight”). Differently from the Automatically
                                                                       Generated Set, although the creators have seen examples of
In this section we introduce experiments we conducted to in-           acts in the evaluation process, Manually Created Set was not
vestigate the effectiveness of our approach in the task of au-         restricted and in consequence included more diverse forms
tomatic praiseworthy act recognition.                                  containing not only negations but also adverbs and passive /



                                                                  43
Table 1: Results for Automatically Generated Set of input               Table 2: Results for the Manually Created Set of input acts.
acts.                                                                                                  Matched / All   Correct
                               Matched / All   Correct                           Statistical Lexicon     52 / 64       63.5%
         Statistical Lexicon     54 / 54       83.3%                             Literature Lexicon      45 / 64       84.4%
         Literature Lexicon      42 / 54       66.7%                              Ethical Lexicon        39 / 64       84.6%
          Ethical Lexicon        17 / 54       58.8%                             Combined Lexicon        44 / 64       90.9%
         Combined Lexicon        45 / 54       68.9%

                                                                        small lexicons is currently more realistic approach for the au-
double verbs as in jiko-chuushin-teki ni koudou-o suru (“to             tomatic recognition (and annotation) of praiseworthy acts.
act selfishly”) and iwareta koto-o yaranai (“not to do what
one was told”).                                                         Ethical Lexicon
                                                                        The smallest of all used lexicons, based on Kohlberg’s theory
3.2   Effectiveness Comparison between Lexicons                         and utilized in automatic ethical recognition task performed
                                                                        worst when the Automatically Generated Set of acts was in-
Having two sets of acts with their human evaluation prepared,           put but outperformed both Statistical and Literature Lexicons
we have performed a series of experiments to examine our                when the Manually Created Set of acts was used.
system’s accuracy when using above described lexicons in
the task of recognizing praiseworthy acts.                              Combined Lexicon
                                                                        We managed to confirm that the combination of Ethical and
Statistical Lexicon                                                     Literature Lexicons performed better than separated ones
Tested with acts from the Automatically Generated Set, the              when the Manually Created Set of acts was used. However,
Statistical Lexicon achieved 83.3% of correct recognitions.             its accuracy was still lower than Statistical Lexicon match-
To confirm our assumption that matching should be per-                  ing sentences retrieved with the Automatically Created Set of
formed only on the right side of an act phrase because it               acts.
is where consequences of the act are usually written (see
Figure 2), we have also run additional tests and confirmed              3.3   Additional Experiments
that analyzing left sides achieves significantly lower accu-            As we aim at recognizing praiseworthy acts in everyday con-
racy (66.7%). Matching within the whole sentence did not                versation, the correct recognition of more natural input acts
bring any improvement in results, besides it doubled search-            is more important than the correct recognition of less natural
ing time. Examples of correctly recognized acts are shouri-o            input acts. To be sure if Statistical Lexicon could perform
iwau (“to celebrate victory”) and kenkou-o mamoru (“to care             better with Manually Created Set we conducted a series of
about one’s health”). On the other hand, tsumi-o kuiru (“to re-         additional tests increasing the range of positive and negative
gret one’s sins”) or shi-o kanashimu (“to grieve one’s death”)          words to see if heuristically chosen size of 3,000 was cor-
were recognized incorrectly due to noisy polarity in the Sta-           rect. We examined 10 sizes starting from 500 words size in-
tistical Lexicon.                                                       creasing it by 500 each time up to 5,000 and also tested the
   When tested with Manually Created Set, the results of Sta-           whole unbalanced list from -1 to 1. It appeared (See Figure
tistical Lexicon dropped as expected. Left side matching                3) that accuracy grows till 1,500 words (increase from 72.9%
brought only 53.7% correct recognitions while again the right           to 80.8%) but when a larger sets are used, the results start
side matching surpassed the left side achieving 63.5% and               to decrease and never exceed these of the Literature Lexicon
the whole sentences scored significantly lower (58.2%). All             (84.4%).
other comparison of results between left side, right side and
whole sentences confirmed this trend, therefore, in order to
avoid confusion, all remaining results we introduce, are from
                                                                        4 Adding Semantic Categories
the matches performed on the right sides following input act            After analyzing sentences which include praiseworthy act but
phrases.                                                                were not counted due to insufficient number of words in lex-
                                                                        icons we decided to examine if we could automatically add
Literature Lexicon                                                      some valuable information to other words and see if the in-
The Literature Lexicon surpassed much larger Statistical Lex-           formation influences the act of praising. We chose semantic
icon when Manually Created Set acts were input but was sig-             categorization and used “Bunrui-Goi-hyo” (Word List by Se-
nificantly less accurate with acts from Statistical Lexicon (see        mantic Principles) [NLRI, 1964] containing 32,600 seman-
Table 1 and Table 2). The perfect recognition rate (54/54               tically categorized words collected from 90 contemporary
matched) may suggests that if a new, less noisy method for              Japanese newspapers. For example the list groups words un-
the automatic estimation of word polarity is proposed and it            der categories as “Thoughts / Opinions / Doubts”, “Helping
covers all words in every possible input, the Statistical Lex-          / Rescuing” or “Profit / Loss”. Our idea was to add sim-
icon would outperform the Literature Lexicon also when fed              ple weighs (count +1) to words that belong to categories
with acts from Manually Created Set. Nevertheless, it would             which tend to be praiseworthy. In order to examine which
be very costly and avoiding polarizing neutral words seems              categories reveal such tendencies we retrieved from the cor-
to be difficult, hence we believe that using manually crafted,          pus all sentences containing acts labeled by human subjects



                                                                   44
      Figure 2: Example sentence from the corpus with input act and a matched Ethical Lexicon word on the right side.


                                                                       entirely, the semantic categories alone achieved slightly bet-
                                                                       ter precision than Ethical Lexicon when the Automatically
                                                                       Generated Set of acts was input. The highest precision when
                                                                       Manually Created Set was used increased the precision of Lit-
                                                                       erature and Ethical Lexicons achieving 94%.


                                                                       5 Conclusion, Future Work and Discussion
                                                                       In this paper we introduced a simple matching algorithm al-
                                                                       lowing an agent to recognize human acts worth praising with
                                                                       maximal 94% agreement with human subjects by using lex-
                                                                       icons (words sets) and Web resources (a blog corpus). The
                                                                       best results were achieved by Literature and Combined Lex-
                                                                       icons with Semantic Categories support when manually cre-
                                                                       ated example acts were input. There is still plenty of room
Figure 3: Results of additional experiments for investigating          for improvement and we plan to increase the coverage of lex-
accuracy changes when using different sizes of the Statistical         icons by matching synonyms, too. We also are experiment-
Lexicon.                                                               ing with changing counting method according to adverbs pre-
                                                                       ceding matching phrases (“a little bit sad” could be scored
                                                                       lower than e.g. “so freaking sad”). As the act of praising is
as praiseworthy and not praiseworthy. Then a simple script             very subjective and depends on many factors, we are plan-
counted how many other words in both datasets belong to                ning to perform wide, possibly intercultural, surveys. We
which semantic category. For example if a blog sentence                would like to conclude with underlining a wider importance
was “I lost the confidence in myself after he spoke ill about          of the ability to automatically recognize praiseworthy acts by
me”, the script was adding negative points to categories as            a machine. Recent worries about Artificial Intelligence tak-
“Profit/Loss” (lost) or “Thoughts / Opinions / Doubts” (con-           ing control over their users could be, at least in our opinion,
fidence). Because some categories contained thousands of               eased by positive examples. Companion robots, while help-
words and other only a few, we decided to assign weights               ing at home and e.g. running memory-quizes for users with
according to differences between frequencies. Examples of              Alzheimer disease, need to be trusted and gaining the trust
categories with distinctly different frequencies are shown in          will be difficult without sharing similar values. Our common
Table 3. Then, in order to ease unbalance between sizes                recognition and evaluation of a fellow human’s behavior can
of both categories, we experimented with combinations of               be measured with shallow sentiment analysis techniques on
weight sets and discovered that accuracy is highest for both           vast textual data which express our experiences and feelings.
praiseworthy and not praiseworthy acts when the former uses            The proposed method demonstrates that the noisy Web re-
weights created from group b) and the latter uses c) (refer to         sources like blogs, when processed carefully, can become one
Table 3).                                                              way to equip artificial agents with a human-like capacity of
                                                                       telling right from wrong without leaning to any specific phi-
4.1   Result Comparison                                                losophy or religion. We believe that a trustworthy machine
To see if semantic categorization is effective, we repeated            should rather operate on estimating overall positive and neg-
all experiments scoring not only matched lexicon words but             ative consequences than on methods based on explicit rules
also other words that belong to specific categories (those with        decided by one or only few programmers. The proposed sys-
tendencies to be praiseworthy or not praiseworthy). Because            tem can easily “explain” its decisions by giving examples of
among semantic categories supposedly specific to praisewor-            retrieved experiences or by presenting a voting ratio, while
thy acts there were ones like Losing and Disappointment, we            most of machine learning based methods are “black boxes”
expected rather low accuracy, but quite surprisingly semantic          and may lead to trust issues. Having said so, we believe that
weighting helped improving all previous results (see Table 4           our method could help to automatically annotate data, which
and Table 5). Even when we excluded lexicon words count                is crucial for machine learning.



                                                                  45
    Table 3: Examples of frequency differences of semantic categories specific to praiseworthy and not praiseworthy acts
            Difference                Praiseworthy acts
            a) More than 4 times:     Helping / Rescuing, Giving / Receiving, Profit / Loss, Winning / Losing,
                                      School / Military, Lending / Borrowing, Physiology, Marking / Signing, etc.
            b) More than 3 times:     Talents, Planning, Specialist jobs,
                                      Associations / Groups, Events / Ceremonies, etc.
            c) More than 2 times:     Economy / Income / Expenditure, Formation, Meaning / Problem / Purpose,
                                      Desire / Expectance / Disappointment, etc.

           Difference                 Not praiseworthy acts
           a) More than 4 times:      Respecting / Thanking / Trusting, Creating / Writing, Old / New / Slow / Fast, Treatment,
                                      Graphs / Tables / Formulas, etc.
           b) More than 3 times:      Acquisition, Eye / Mouth / Nose functions, Roads / Bridges,
                                      Land vehicles, Fear / Anger, etc.
           c) More than 2 times:      Linguistic activities, Birds, Associations, Distress / Sorrow,
                                      Partners / Colleagues, etc.


                                                                         [Crowell et al., 1988] Charles R Crowell, D Chris Anderson,
Table 4: Effectiveness comparison of implementing semantic                 Dawn M Abel, and Joseph P Sergio. Task clarification,
categories (Automatically Generated Set).                                  performance feedback, and social praise: Procedures for
                                    Matched / All   Correct                improving the customer service of bank tellers. Journal of
      Semantic Category (SC)          52 / 54       78.8%                  Applied Behavior Analysis, 21(1):65–71, 1988.
      Statistical Lexicon + SC        54 / 54       85.2%                [Deng and Wiebe, 2014] Lingjia Deng and Janyce Wiebe.
      Literature Lexicon + SC         54 / 54       81.5%
                                                                           Sentiment propagation via implicature constraints. In
       Ethical Lexicon + SC           52 / 54       76.9%
      Combined Lexicon + SC           54 / 54       85.2%
                                                                           EACL, pages 377–385, 2014.
                                                                         [Ding and Riloff, 2016] Haibo Ding and Ellen Riloff. Ac-
                                                                           quiring knowledge of affective events from blogs using
                                                                           label propagation. In Proceedings of the Thirtieth AAAI
Table 5: Effectiveness comparison of implementing semantic                 Conference on Artificial Intelligence (AAAI-16), 2016.
categories (Manually Created Set).
                                                                         [Fogg and Nass, 1997] B.J. Fogg and C. Nass. Silicon syco-
                                    Matched / All   Correct
                                                                            phants: the effects of computers that flatter. International
      Semantic Category (SC)          50 / 64       92.0%                   Journal of Human-Computer Studies, 46(5):551 – 561,
      Statistical Lexicon + SC        52 / 64       88.5%                   1997.
      Literature Lexicon + SC         50 / 64       94.0%
       Ethical Lexicon + SC           50 / 64       90.0%                [Garland et al., 2008] Ann F Garland, Kristin M Hawley,
      Combined Lexicon + SC           50 / 64       94.0%                  Lauren Brookman-Frazee, and Michael S Hurlburt. Iden-
                                                                           tifying common elements of evidence-based psychosocial
                                                                           treatments for children’s disruptive behavior problems.
References                                                                 Journal of the American Academy of Child & Adolescent
                                                                           Psychiatry, 47(5):505–514, 2008.
[Bandura, 1977] Albert Bandura. Self-efficacy: toward a                  [Henderlong and Lepper, 2002] Jennifer Henderlong and
  unifying theory of behavioral change. Psychological re-                  Mark R Lepper. The effects of praise on children’s in-
  view, 84(2):191, 1977.                                                   trinsic motivation: a review and synthesis. Psychological
[Burnard, 2003] Philip Burnard. Ordinary chat and therapeu-                bulletin, 128(5):774, 2002.
  tic conversation: phatic communication and mental health               [Hofmann et al., 2012] Stefan G Hofmann, Anu Asnaani,
  nursing. Journal of Psychiatric and Mental Health Nurs-                  Imke JJ Vonk, Alice T Sawyer, and Angela Fang. The
  ing, 10(6):678–682, 2003.                                                efficacy of cognitive behavioral therapy: a review of meta-
[Cambria et al., 2013] E. Cambria, B. Schuller, Yunqing                    analyses. Cognitive therapy and research, 36(5):427–440,
  Xia, and C. Havasi. New avenues in opinion mining and                    2012.
  sentiment analysis. Intelligent Systems, IEEE, 28(2):15–               [Kanouse et al., 1981] David E Kanouse, Peter Gumpert,
  21, March 2013.                                                          and Donnah Canavan-Gumpert. The semantics of praise.
[Carton, 1996] John S Carton. The differential effects of tan-             New directions in attribution research, 3:97–115, 1981.
  gible rewards and praise on intrinsic motivation: A com-               [Kaptein et al., 2010] Maurits Kaptein, Panos Markopoulos,
  parison of cognitive evaluation theory and operant theory.               Boris Ruyter, and Emile Aarts. Two acts of social intelli-
  The Behavior Analyst, 19(2):237, 1996.                                   gence: the effects of mimicry and social praise on the eval-



                                                                    46
   uation of an artificial agent. AI & SOCIETY, 26(3):261–               In Proceedings of the 2008 ACM symposium on Applied
   273, 2010.                                                            computing, pages 1556–1560. ACM, 2008.
[Kawahara and Kurohashi, 2006] Daisuke Kawahara and                   [Swann et al., 1992] William B Swann, Richard M Wenzlaff,
   Sadao Kurohashi. A fully-lexicalized probabilistic model              and Romin W Tafarodi. Depression and the search for
   for japanese syntactic and case structure analysis. In                negative evaluations: more evidence of the role of self-
   Proceedings of the Main Conference on Human Language                  verification strivings. Journal of Abnormal Psychology,
   Technology Conference of the North American Chapter                   1992.
   of the Association of Computational Linguistics, HLT-              [Takamura et al., 2005] Hiroya Takamura, Takashi Inui, and
   NAACL ’06, pages 176–183, Stroudsburg, PA, USA,                       Manabu Okumura. Extracting semantic orientations of
   2006. Association for Computational Linguistics.                      words using spin model. In Proceedings of the 43rd An-
[Koestner et al., 1987] Richard Koestner, Miron Zuckerman,               nual Meeting on Association for Computational Linguis-
   and Julia Koestner. Praise, involvement, and intrinsic mo-            tics, pages 133–140. Association for Computational Lin-
   tivation. Journal of personality and social psychology,               guistics, 2005.
   53(2):383, 1987.                                                   [Takeuchi et al., 2010] Koichi         Takeuchi,        Suguru
[Kohlberg, 1981] Lawrence Kohlberg. The Philosophy of                    Tsuchiyama, Masato Moriya, and Yuuki Moriyasu.
   Moral Development. Harper and Row, 1th edition, 1981.                 Construction of argument structure analyzer toward
[Lipnevich and Smith, 2008] Anastasiya A Lipnevich and                   searching same situations and actions. Technical Re-
   Jeffrey K Smith. Response to assessment feedback: The                 port 390, IEICE technical report. Natural language
   effects of grades, praise, and source of information. ETS             understanding and models of communication, jan 2010.
   Research Report Series, 2008(1):i–57, 2008.                        [Tejima and Bunki, 2001] Noriyuki Tejima and Hitomi
[Nakamura, 1993] Akira Nakamura. Kanjo hyogen jiten                      Bunki. Feasibility of measuring the volition level in
   [Dictionary of Emotive Expressions]. Tokyodo Publish-                 elderly patients when using audio encouragement during
   ing, 1993.                                                            gait training physical therapy. In Engineering in Medicine
                                                                         and Biology Society, 2001. Proceedings of the 23rd
[NLRI, 1964] National Language Research Institute NLRI.                  Annual International Conference of the IEEE, volume 2,
   Bunrui Goi Hyo (Word List by Semantic Principles, in                  pages 1393–1395. IEEE, 2001.
   Japanese). Shuei Shuppan, 1964.
                                                                      [Tejima et al., 1998] Noriyuki Tejima, Yoko Takahashi, and
[Ptaszynski et al., 2012] Michal Ptaszynski, Pawel Dybala,               Hitomi Bunki. Verbal-encouragement algorithm in gait
   Rafal Rzepka, Kenji Araki, and Yoshio Momouchi. Yacis:                training for the elderly. In Engineering in Medicine and Bi-
   A five-billion-word corpus of japanese blogs fully anno-              ology Society, 1998. Proceedings of the 20th Annual Inter-
   tated with syntactic and affective information. In Proceed-           national Conference of the IEEE, volume 5, pages 2724–
   ings of The AISB/IACAP World Congress, pages 40–49,                   2725. IEEE, 1998.
   2012.
                                                                      [Weiner et al., 1972] Bernard Weiner, Heinz Heckhausen,
[Rzepka and Araki, 2015] Rafal Rzepka and Kenji Araki.                   and Wulf-Uwe Meyer. Causal ascriptions and achievement
   Rethinking Machine Ethics in the Age of Ubiquitous Tech-              behavior: a conceptual analysis of effort and reanalysis of
   nology, chapter Semantic Analysis of Bloggers Experi-                 locus of control. Journal of personality and social psy-
   ences as a Knowledge Source of Average Human Morality,                chology, 21(2):239, 1972.
   pages 73–95. Hershey: IGI Global, 2015.
                                                                      [Wilson et al., 2005] Theresa Wilson, Janyce Wiebe, and
[Sarma et al., 2014] Bandita Sarma, Amitava Das, and Rod-
                                                                         Paul Hoffmann.         Recognizing contextual polarity in
   ney D Nielsen. A framework for health behavior change                 phrase-level sentiment analysis. In Proceedings of the
   using companionable robots. INLG 2014, page 103, 2014.                conference on human language technology and empirical
[Socher et al., 2013] Richard Socher, Alex Perelygin, Jean Y             methods in natural language processing, pages 347–354.
   Wu, Jason Chuang, Christopher D Manning, Andrew Y                     Association for Computational Linguistics, 2005.
   Ng, and Christopher Potts. Recursive deep models for se-           [Wright et al., 2005] Jesse H. Wright, Andrew S. Wright,
   mantic compositionality over a sentiment treebank. In Pro-            Anne Marie Albano, Monica R. Basco, L. Jane Gold-
   ceedings of the conference on empirical methods in nat-               smith, Troy Raffield, and Michael W. Otto. Computer-
   ural language processing (EMNLP), volume 1631, page                   assisted cognitive therapy for depression: Maintaining ef-
   1642. Citeseer, 2013.                                                 ficacy while reducing therapist time. The American Jour-
[Strain et al., 1983] Phillip S Strain, Deborah L Lambert,               nal of Psychiatry, 162(6):1158–64, Jun 2005.
   Mary Margaret Kerr, Vaughan Stagg, and Donna A                     [Zimmerman et al., 2009] Frederick J Zimmerman, Jill Gilk-
   Lenkner. Naturalistic assessment of children’s compli-                erson, Jeffrey A Richards, Dimitri A Christakis, Dongxin
   ance to teachers’requests and consequences for compli-                Xu, Sharmistha Gray, and Umit Yapanel. Teaching by lis-
   ance. Journal of Applied Behavior Analysis, 16(2):243–                tening: The importance of adult-child conversations to lan-
   249, 1983.                                                            guage development. Pediatrics, 124(1):342–349, 2009.
[Strapparava and Mihalcea, 2008] Carlo Strapparava and
   Rada Mihalcea. Learning to identify emotions in text.



                                                                 47