<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Predicting Emotion Labels for Chinese Microblog Texts</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Zheng Yuan</string-name>
          <email>1yuanzheng.liliian@hotmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matthew Purver</string-name>
          <email>2m.purver@qmul.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Emoticons OMG; (0.o);</institution>
          ,
          <addr-line>O_o</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Electronic Engineering and Computer Science Queen Mary University of London Mile End Road</institution>
          ,
          <addr-line>London E1 4NS</addr-line>
        </aff>
      </contrib-group>
      <fpage>40</fpage>
      <lpage>47</lpage>
      <abstract>
        <p>We describe an experiment into detecting emotions in texts on the Chinese microblog service Sina Weibo using distant supervision with various author-supplied conventional labels (emoticons and smilies). Existing word segmentation tools proved unreliable; better accuracy was achieved using character-based features. Accuracy varied according to emotion and labelling convention: while smilies are used more often, emoticons are more reliable. Happiness is the most accurately predicted emotion (85.9%). This approach works well and achieves 80% accuracies for "happy" and "fear", even though the performances for the seven emotion classes are quite different.</p>
      </abstract>
      <kwd-group>
        <kwd>Social Media</kwd>
        <kwd>Sina Weibo</kwd>
        <kwd>Emotion Detection</kwd>
        <kwd>Emoticons</kwd>
        <kwd>Smilies</kwd>
        <kwd>Distant Supervision</kwd>
        <kwd>N-gram lexical features</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Social media has become a very popular communication tool among Internet users.
Sina Weibo (hereafter Weibo), is a Chinese microblog website. Most people take it as
the Chinese version of Twitter; it is one of the most popular sites in China, in use by
well over 30% of Internet users, with a similar market penetration that Twitter has
established in the USA (Rapoza, 2011 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]), and has therefore become a valuable
source of people’s opinions and sentiments.
      </p>
      <p>Microblog texts (statuses) are very different from general newspaper or web text.
Weibo statuses are shorter and more casual; many topics are discussed, with less
coherence between texts. Combining this with the huge amount of lexical and syntactic
variety (misspelt words, new words, emoticons, unconventional sentence structures)
in Weibo data, many existing methods for emotion and sentiment detection which
depend on grammar- or lexicon-based information are no longer suitable.</p>
      <p>
        Machine learning via supervised classification, on the other hand, is robust to such
variety but usually requires hand-labeled training data. This is difficult and
timeconsuming with large datasets, and can be unreliable when attempting to infer an
author's emotional state from short texts (see e.g. Purver &amp; Battersby, 2012 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]). Our
solution is to use distant supervision: we adapt the approach of (Go et al., 2009 [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ];
Purver &amp; Battersby, 2012 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]) to Weibo data, using emoticons and Weibo's built-in
smilies as author-generated emotion labels, allowing us to produce an automatic
classifier to classify Weibo statuses into different basic emotion classes. Adapting this
approach to Chinese data poses several research problems: finding accurate and
reliable labels to use, segmenting Chinese text and extracting sensible lexical features.
      </p>
      <p>Our experiments show that choice of labels has a significant effect, with emoticons
generally providing higher accuracy than Weibo's smilies, and that choice of text
segmentation method is crucial, with current word segmentation tools providing poor
accuracy on microblog text and character-based features proving superior.
2
2.1</p>
      <p>Background</p>
      <p>
        Sentiment/Emotion Analysis
Most research in this area focuses on sentiment analysis – classifying text as positive
or negative (Pang and Lee, 2008 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]). However, finer-grained emotion detection is
required to provide cues for further human-computer interaction, and is critical for the
development of intelligent interfaces. It is hard to reach a consensus on how the basic
emotions should be categorised, but here we follow (Chuang and Wu, 2004 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]) and
others in using (Ekman, 1972 [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ])’s definition, providing six basic emotions: anger,
disgust, fear, happiness, sadness, surprise.
2.2
      </p>
      <p>
        Distant Supervision
Distant supervision is a semi-supervised learning algorithm that combines supervised
classification with a weakly labeled training dataset. (Go et al., 2009 [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]) and (Pak
and Paroubek, 2010 [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]), following (Read, 2005 [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]), use emoticons to provide these
labels to classify positive/negative sentiment in Twitter messages with above 80%
accuracy.
      </p>
      <p>
        (Yuasa et al., 2006 [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]) showed that emoticons have an important role in
emphasizing the emotions conveyed in a sentence; they can therefore give us direct access to
the authors’ own emotions. (Purver and Battersby, 2012 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]) thus used a broader set
of emoticons to extend the distant supervision approach to six-way emotion
classification in English, and we apply a similar approach. However, in addition to the widely
used, domain-independent emoticons, other markers have emerged for particular
interfaces or domains. Sina Weibo provides a built-in set of smilies that can work as
special emoticons that help us better understand authors’ emotions.
2.3
      </p>
      <p>
        Chinese Text Processing
In Chinese text, sentences are represented as strings of Chinese characters without
explicit word delimiters as used in English (e.g. white space). Therefore, it is
important to determine word boundaries before running any word-based linguistic
processing on Chinese. There is a large body of research into Chinese word segmentation
(Fan and Tsai, 1988 [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]; Sproat and Shih, 1990 [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]; Gan et al, 1996 [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]; Guo, 1997
[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]; Jin and Chen, 1998 [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]; Wu, 2003 [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]). Among them, the basic technique for
identifying distinct words is based on the lexicon-based identification scheme (Chen
and Liu, 1992). This approach performs word segmentation process using matching
algorithms: matching input character strings with a known lexicon. However, since
the real-world lexicon is open-ended, new words are coming out every day – and this
is especially true with social media. A lexicon is therefore difficult to construct or
maintain accurately for such a domain.
3
3.1
      </p>
      <p>Data</p>
    </sec>
    <sec id="sec-2">
      <title>Corpus Collection</title>
      <p>Our training data consisted of Weibo statuses with emoticons and smilies. Since
Weibo has a public API, training data can be obtained through automated means. We
wrote a script which requested the statuses public_timeline API1 every two minutes
and inserted the collected data into a MySQL database. We collected a corpus of
Weibo data, filtering out messages not containing emotion labels (see below and
Table 2 for details).
3.2</p>
    </sec>
    <sec id="sec-3">
      <title>Emotion Labels</title>
      <p>
        We used two kinds of emotion labels (emoticons and smilies) as our noisy labels. The
emoticons and smilies are noisy themselves: ambiguous or vague. Not all the
emoticons and smilies have close relationships with the emotion classes. And some
emoticons and smilies may be used in different situations, as different people have different
understandings. Emoticons here are Eastern-style emoticons, very different from
Western-style ones (see e.g. Kayan et al., 2006 [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]). Smilies are Sina Weibo's
builtin smilies. Initial investigation found that not all emoticons and smilies can be
classified into Ekman’s six emotion classes; and for some lesser used labels, authors have
widely different understandings. We identified the most widely used and well-known
emoticons/smilies to use as labels – see Table 1.
(^_^); (*^__^*);(^o^);
(^.^);O(∩_∩)O;
[ਲ਼᛺
[౫
[૸
[啃ᦼ
[ੀ tu “sick”]
      </p>
      <p>Smilies
chi-jing “surprise”]
xi-xi “heehee”];
ha-ha “haha”];
gu-zhang “applaud”];
1 http://open.weibo.com/wiki/2/statuses/public_timeline</p>
      <p>42
angry</p>
      <p>͐ (͒ ^͓ )͑ ; (͓ _͒ )
fear
sad</p>
      <p>Just use the keyword ᇣᙅ
(T_T); (T.T); (T^T);
(෌ .෌ );
(͒Σ͓ ); (ΎΗ );</p>
      <p>[䝉㿶
hai-pai “fear”</p>
      <p>da-kai-xin “so happy”]
[ཚᔰᗳ
[ᙂ
[ᙂ傲
[ବ
[ཡᵋ
nu “anger”];</p>
      <p>nu-ma “curse”];
heng “humph”];</p>
      <p>bi-shi “disdain”]
[⌚ lei “tear”];
shi-wang “disappointed”];
bei-shang “sad”]
3.3</p>
      <sec id="sec-3-1">
        <title>Text Processing</title>
        <p>We used a Chinese language selection filter to filter out all other language characters
or words, removed URLs, Weibo usernames (starting with @), digits, and any other
notations, e.g., *, ̞ , only leaving Chinese characters. We then removed the
emoticons and smilies from the texts, replacing them with positive/negative labels for the
relevant emotion classes for training and testing purposes. We then extracted different
kinds of lexical features: segmented Chinese words, Chinese characters, and higher
order n-grams.</p>
        <p>
          For word-based features, we need to segment the sentences. There are lots of
Chinese word segmentation tools; however, many are unsuitable for online social media
text; we chose pymmseg2, smallseg3 and the Stanford Chinese Word Segmenter 4,
which all appeared to give reasonable results. Pymmseg uses the MMSEG algorithm
(Tsai, 2000 [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]). Smallseg is an open sourced Chinese segmentation tool based on
DFA. The Stanford Segmenter is CRF-based (Tseng et al, 2005 [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]).
3.4
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Corpus Analysis</title>
        <p>
          Our database contains 229,062 Weibo statuses with emotion labels; Table 2 shows
statistics. The number of Weibo statuses varied with the popularity of the labels
themselves: “happy” and “sad” labels are much more frequent than others; very
similar results are observed in English Twitter statuses (see e.g. [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]), suggesting that these
frequencies are relatively stable across very different languages.
surprise 347 63 284
disgust 142 N/A 142
happy 5685 712 4973
angry 2318 9 2305
fear 480 Key words: 480
sad 5422 1064 4358
        </p>
        <p>Overall frequencies show that users of Weibo are more likely to use the built-in
smilies rather than emoticons. One possible reason is that smilies can be inserted with
a single mouse click, whereas emoticons must be typed using several keystrokes –
Eastern-style emoticons are usually made of five or more characters.
4</p>
        <p>
          Experiments and Discussions
Classification was using support vector machines (SVMs) (Vapnik, 1995 [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ])
throughout, with the help of the LibSVM tools (Chang and Lin, 2001 [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]). The
performance was evaluated using 10-fold cross validation. Our datasets were balanced: a
dataset of size N contained N/2 positive instances (statuses containing labels for this
emotion class) and N/2 negative ones (statuses containing labels from other classes).
For the N/2 negative instances, we randomly selected instances from other emotion
classes for larger datasets (N&gt;1200), but ensured an even weighting across negative
classes for smaller sets to prevent bias towards one negative class. Because of the
different frequency of different emotion labels, we mainly focused on “happy”,
“angry” and “sad”, and present tentative results for the other emotion classes.
4.1
        </p>
        <p>Segmented Words-VS-Characters
In the first experiment, we investigated the effect of different segmentation tools and
compared word-based vs character-based features.</p>
        <p>After testing on “angry”, “happy” and “sad”, we found that pymmseg
outperformed the other tools; we therefore used pymmseg for later experiments. However,
as we increased the dataset size, we found that character-based features had even
better performance than word features (using pymmseg) for all three classes. Our
results suggest that we could just use Chinese characters, rather than doing any word
segmentation - see Figure 1.</p>
        <p>Examination of the segmented data showed that the segmentation tools didn’t work
well with our social media data and made lots of mistakes. In addition, all
segmentation tools produced many segmented words which were actually just one character.
The use of character-based features was therefore preferred.
In the second experiment, we tried to improve the overall performance.</p>
        <p>
          Whether higher-order n-grams are useful features appears to be a matter of some
debate. (Pang et al., 2002 [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]) report that unigrams outperform bigrams when
classifying movie reviews by sentiment polarity, but (Dave et al., 2003 [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]) find that
bigrams and trigrams can give better product-review polarity classification.
        </p>
        <p>Results showed that higher-order n-grams are useful features for our wide-topic
social media Weibo data. Bigrams and trigrams outperform unigrams for all these
three emotion classes (see Figure 2). In our experiments with bigram and trigram
features, we also included the lower-order n-grams (unigrams, bigrams), as there are
lots of Chinese words with only one character. Our experiments also showed that
increasing our dataset sizes increased accuracy; as our dataset sizes increase over
time, we therefore expect improvements in accuracy (Figs 1 and 2).
Emotion Mixed Emoticons Smilies
happy 73.8% 85.9% 74.6%
sad 62.8% 67.5% 66.0%
Results showed that the emoticon labels were easier to classify than smilies. By
looking at the data, we found that people use emoticons in a more systematic or
consistent way. They use emoticons to tell others what their real emotions are
(“happy”, “sad” etc.), but on the other hand, they use smilies for a much bigger range
of things, such as jokes, sarcasm, etc. Some people use smilies just to make their
Weibo statuses more interesting and lively, apparently without any subjective feelings.
5</p>
        <p>Conclusion
We used SVMs for automatic emotion detection for Chinese microblog texts. Our
results show that using emoticons and smilies as noisy labels is an effective way to
perform distant supervision for Chinese. Emoticons seem to be more reliable for
emotion detection than smilies. It was also found that, when dealing with social media
data, many Chinese word segmentation tools do not work well. Instead, we can use
characters as lexical features and performance improves with higher-order n-grams.
Increasing the dataset size also improves performance, and our future work will
examine larger sets.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Kenneth</given-names>
            <surname>Rapoza</surname>
          </string-name>
          :
          <article-title>China's Weibos vs US's Twitter:</article-title>
          And the Winner Is? http://www.forbes.com/sites/kenrapoza/2011/05/17/chinas-weibos
          <article-title>-vs-uss-</article-title>
          <string-name>
            <surname>twitter-</surname>
          </string-name>
          and
          <string-name>
            <surname>-</surname>
          </string-name>
          thewinner-is/ (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Matthew</given-names>
            <surname>Purver</surname>
          </string-name>
          and Stuart Batersby:
          <article-title>Experimenting with Distant Supervision for Emotion Classification</article-title>
          .
          <source>In: 13th Conference of the European Chapter of the Association for Computational Linguistics</source>
          . (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Alec</given-names>
            <surname>Go</surname>
          </string-name>
          , Richa Bhayani, and Lei Huang:
          <article-title>Twitter sentiment classification using distant supervision</article-title>
          .
          <source>Master's thesis</source>
          , Stanford University. (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Bo</given-names>
            <surname>Pang</surname>
          </string-name>
          and
          <string-name>
            <given-names>Lillian</given-names>
            <surname>Lee</surname>
          </string-name>
          :
          <article-title>Opinion mining and sentiment analysis</article-title>
          .
          <source>Foundations and Trends in In- formation Retrieval</source>
          ,
          <volume>2</volume>
          (
          <issue>1</issue>
          -2):
          <fpage>1</fpage>
          -
          <lpage>135</lpage>
          . (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Ze-Jing Chuang</surname>
          </string-name>
          and
          <string-name>
            <surname>Chung-Hsien Wu</surname>
          </string-name>
          <article-title>: Multi- modal emotion recognition from speech and text</article-title>
          .
          <source>In: Computational Linguistics and Chinese Language</source>
          ,
          <volume>9</volume>
          (
          <issue>2</issue>
          ):
          <fpage>45</fpage>
          -
          <lpage>62</lpage>
          . (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Paul</given-names>
            <surname>Ekman</surname>
          </string-name>
          :
          <article-title>Universals and cultural differ- ences in facial expressions of emotion</article-title>
          . In J. Cole, editor,
          <source>Nebraska Symposium on Motivation</source>
          <year>1971</year>
          , volume
          <volume>19</volume>
          . University of Nebraska Press. (
          <year>1972</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Alexander</given-names>
            <surname>Pak</surname>
          </string-name>
          and Patrick Paroubek:
          <article-title>Twitter as a corpus for sentiment analysis and opinion mining</article-title>
          .
          <source>In: 7th Conference on International Language Resources and Evaluation</source>
          . (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Jonathon</given-names>
            <surname>Read</surname>
          </string-name>
          :
          <article-title>Using emoticons to reduce dependency in machine learning techniques for sentiment classification</article-title>
          .
          <source>In: 43rd Meeting of the Association for Computational Linguistics</source>
          . (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Masahide</given-names>
            <surname>Yuasa</surname>
          </string-name>
          ,
          <article-title>Keiichi Saito and Naoki Mukawa: Emoticons convey emotions without cognition of faces: an fMRI study</article-title>
          .
          <source>CHI EA '06. ISBN: 1-59593-298-4</source>
          , doi: 10.1145/1125451.1125737 (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Fan</surname>
            ,
            <given-names>C. K.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Tsai</surname>
            ,
            <given-names>W. H.</given-names>
          </string-name>
          :
          <article-title>Automatic word identification in Chinese sentences by the relaxation technique</article-title>
          .
          <source>Computer Processing of Chinese &amp; Oriental Languages</source>
          ,
          <volume>4</volume>
          ,
          <fpage>33</fpage>
          -
          <lpage>56</lpage>
          . (
          <year>1988</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. Richard Sproat and
          <string-name>
            <given-names>Chilin</given-names>
            <surname>Shih</surname>
          </string-name>
          :
          <article-title>A Statistical Method for Finding Word Boundaries in Chinese Text</article-title>
          .
          <source>Computer Processing of Chinese and Oriental Languages</source>
          ,
          <volume>4</volume>
          ,
          <fpage>336</fpage>
          -
          <lpage>351</lpage>
          , (
          <year>1990</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Kok-Wee</surname>
            <given-names>Gan</given-names>
          </string-name>
          , , Martha Palmer, and
          <string-name>
            <surname>Kim-Teng Lua</surname>
          </string-name>
          :
          <article-title>A statistically emergent approach for language processing: Application to modeling context effects in ambiguous Chinese word boundary perception</article-title>
          .
          <source>Computational Linguistics</source>
          ,
          <volume>22</volume>
          (
          <issue>4</issue>
          ):
          <fpage>531</fpage>
          -
          <lpage>53</lpage>
          . (
          <year>1996</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. Jin Guo:
          <article-title>Critical tokenization and its properties</article-title>
          .
          <source>Computational Linguistics</source>
          ,
          <volume>23</volume>
          (
          <issue>4</issue>
          ):
          <fpage>569</fpage>
          -
          <lpage>596</lpage>
          . (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Wangying</surname>
            <given-names>Jin</given-names>
          </string-name>
          , and Lei Chen:
          <article-title>Identifying unknown words in Chinese corpora</article-title>
          .
          <source>In: First Workshop on Chinese Language</source>
          , University of Pennsylvania, Philadelphia. (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Andi</surname>
          </string-name>
          <article-title>Wu: Customizable segmentation of morphologically derived Words in Chinese</article-title>
          .
          <source>In: Computational Linguistics and Chinese Language</source>
          .
          <volume>8</volume>
          (
          <issue>2</issue>
          ). (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Shipra</surname>
            <given-names>Kayan</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Susan</surname>
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Fussell</surname>
          </string-name>
          and
          <string-name>
            <surname>Leslie D. Setlock</surname>
          </string-name>
          :
          <article-title>Cultural differences in the use of instant messaging in Asia and North America</article-title>
          .
          <source>In: 20th anniversary conference on Computer supported cooperative work</source>
          , Banff, Alberta, Canada. (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Chih-Hao</surname>
            <given-names>Tsai</given-names>
          </string-name>
          :
          <article-title>MMSEG: A Word Identification System for Mandarin Chinese Text Based on Two Variants of the Maximum Matching Algorithm</article-title>
          . http://technology.chtsai.org/mmseg/ (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Huihsin</surname>
            <given-names>Tseng</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Pichuan</given-names>
            <surname>Chang</surname>
          </string-name>
          , Galen Andrew, Daniel Jurafsky and
          <article-title>Christopher Manning: A Conditional Random Field Word Segmenter</article-title>
          .
          <source>In: Fourth SIGHAN Workshop on Chinese Language Processing</source>
          . (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Vladimir</surname>
            <given-names>N.</given-names>
          </string-name>
          <article-title>Vapnik: The Nature of Statistical Learning Theory</article-title>
          . (
          <year>1995</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Chih-Chung Chang</surname>
          </string-name>
          and
          <string-name>
            <surname>Chih-Jen</surname>
            <given-names>Lin</given-names>
          </string-name>
          :
          <article-title>LIBSVM: a library for Support Vector Machines</article-title>
          . http://www.csie.ntu.edu.tw/~cjlin/libsvm/ (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Bo</surname>
            <given-names>Pang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Lillian</given-names>
            <surname>Lee</surname>
          </string-name>
          , and Shivakumar Vaithyanathan:
          <article-title>Thumbs up? Sentiment classification using machine learning techniques</article-title>
          .
          <source>In: Conference on Empirical Methods in Natural Language Processing</source>
          , pages
          <fpage>79</fpage>
          -
          <lpage>86</lpage>
          . (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Kushal</surname>
            <given-names>Dave</given-names>
          </string-name>
          , Steve Lawrence, and David M.
          <article-title>Pennock: the peanut gallery: Opinion extraction and semantic classification of product reviews</article-title>
          .
          <source>In: WWW</source>
          , pages
          <fpage>519</fpage>
          -
          <lpage>528</lpage>
          . (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>