<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Methods for Automatic Sentiment Detection</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Plekhanov Russian University of Economics</institution>
          ,
          <addr-line>36 Stremyanny lane, Moscow, 115998</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <fpage>0000</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>Semantic analysis has great potential applications in various fields of science and the national economy. Much of the information in the world is not structured, so there is the problem of processing and extracting useful data. Natural language processing is a very complex process. On November 23, 2017, as part of the 2017 Runet Prize award ceremony, the HOT-LIST 2018 was presented, that identified the main digital trends of 2018 and presented a list of trendsetters in 10 technology, innovation and business areas.There were included 10 leading companies in 10 areas and the first area is AI technologies. AI technology has become a key technology trend in 2018, and the volume of global investment in these technologies and products based on them in excess of $ 1 billion.[25] More than 180 private companies working on projects in the field of AI technologies have been purchased during the period 2011-2018. According to IDC Customer Insights &amp; Analysis.[25] During the period 2011-2018 it was purchased more than 180 private companies working on projects of AI technologies. According to forecasts of Frost &amp; Sullivan, by 2022 artificial intelligence market will grow to $ 10 billion using machine learning technologies and natural language recognition in advertising, retail, finance and health.[25]</p>
      </abstract>
      <kwd-group>
        <kwd>sentiment analysis</kwd>
        <kwd>rule-based approach</kwd>
        <kwd>sentiment lexicon</kwd>
        <kwd>machine learning</kwd>
        <kwd>dictionary approaches for determining the sentiment of text</kwd>
        <kwd>comparison of methods for automatic sentiment determination</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Semantic analysis has great potential applications in various fields of science and the
national economy. Currently, it is used in the monitoring, analysis, signal systems,
document management systems, advertising platforms, and much more. Tonality
analysis of the text is to extract from the text of opinions and emotions, as well as
their subsequent processing, refers to methods of content analysis and is a means of
exploring subjectivity in natural language.[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
Much of the information in the world is not structured, so there is the problem of
processing and extracting useful data. Natural language processing is a very complex
process. There exist numerous methods. This may be the use of lexical and
grammatical structures, along with an estimated lexicon. Besides, machine-learning techniques
can be used to solve such problems. In this approach to solving the problem of
sentiment analysis, necessary set of texts as a training sample. Machine learning with the
teacher needs a set of pre-marked text reviews.
      </p>
      <p>
        On November 23, 2017, as part of the 2017 Runet Prize award ceremony, the
HOTLIST 2018 was presented, that identified the main digital trends of 2018 and
presented a list of trendsetters in 10 technology, innovation and business areas.There were
included 10 leading companies in 10 areas and the first area is AI technologies.
AI technology has become a key technology trend in 2018, and the volume of global
investment in these technologies and products based on them in excess of $ 1
billion.[25]
More than 180 private companies working on projects in the field of AI technologies
have been purchased during the period 2011-2018. According to IDC Customer
Insights &amp; Analysis.[25]
During the period 2011-2018 it was purchased more than 180 private companies
working on projects of AI technologies. According to forecasts of Frost &amp; Sullivan,
by 2022 artificial intelligence market will grow to $ 10 billion using machine learning
technologies and natural language recognition in advertising, retail, finance and
health.[25]
Dynamics of artificial intelligence is based on five fundamental technologies:
 machine learning, [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
 in-depth training, [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
 computer vision, [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
 natural language processing, [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
 machine reasoning and strong artificial intelligence. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
The main drivers of the market will be the sectors of consumer products, business
services, advertising and defense. Processing market natural language (NLP) and
products on its basis is estimated by experts in the area of $ 8 billion in 2018 and will
grow to $ 40 billion by 2025. [25] The main drivers will be the increasing demand for
more advanced level of user experience, increased use of smart device, the growth of
investments in health care, the growing use of network and cloud-based business
applications and the growth of M2M-technology. NLP market growth is constrained by
factors such as the presence of the gap in perception / understanding / recognition of
textual information between man and machine, the shortage of personnel and training
programs for researchers in the field of NLP, as well as the complexity of machining
and understanding of the context and meaning of the text. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] It is also one of the
challenges in the segment of natural language processing is the creation of universal
language models and architectures that will address a variety of work tasks with the
text with a single system. That is a system that will "understand" text information and
be able to communicate with the person as it would make the other person who has
read the text and having some amount of knowledge[11]. Certain restrictions are
applied directly to the understanding of the Russian language. In this case, the quality of
understanding depends on many factors: language, national culture of the interlocutor,
etc. One of the major technology trends in the segment of natural language processing
for today - is to use machine-learning techniques to reduce labor costs for text layout,
machine learning methods without a teacher or partial involvement of teachers, active
methods of machine learning, etc. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] High efficiency in dealing with language
processing tasks shown as vector representation of words and other language
constructions - that is, deep machine learning and neural networks. Therefore, many natural
language processing tasks today are solved with the use of vector representations and
deep learning of neural networks. Also, one of the trends the last time - is to use the
algorithm of knowledge transfer (Transfer Learning), in which the NLP-trained
models to solve simple tasks with the use of large amounts of data. Further, these
pretraining models are used for other, more specific tasks. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>The main approaches for determining tonality</title>
      <p>Tonality analysis is generally defined as one of the problems of computational
linguistics, i.e. meant that we could find and classify the tone using natural language
processing tools (such as a tagger, parsers, etc..). Make a big generalization, it is
possible to divide the existing approaches into the following categories [18]:
1. Approaches based on rules;
2. Approaches based on dictionaries;
3. With the teacher machine learning;
4. Machine learning without a teacher.</p>
      <p>
        The first type of system consists of a set of rules applying to the system concludes the
tone of the text. Many commercial systems using this approach, even though it is
costly, because for the good work it is necessary to make many the rules of the
system. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] Often the rules are tied to a specific domain (e.g. "theme restaurant") and the
change of the domain ( "review of the camera") is required to re-make the rules. [15]
However, this approach is the most accurate in the presence of a good rule base, but it
is not interesting for the study. Approaches based on dictionaries, use so-called tonal
dictionaries for text analysis. In the simplest form, tonal vocabulary is a list of words
with the value of the tone for each word. [24] To analyze the text, one can use the
following algorithm: first, every word in the text to assign it a value of tonality from
the dictionary (if it is present in the dictionary), and then calculate the overall tone of
the text. Calculate the overall tone of a variety of ways. The simplest of them - the
average of all values. A more complex - to train a classifier (e.g. a neural network.).
[17]
Machine learning with a teacher It is the most common method used in the
research. Its essence is to teach the machine classifier on the collection of
pre-markedup text, and then use the resulting model for the analysis of new documents. It is
about this method. [23]
Machine learning without a teacher It is probably the most interesting and at the
same time the least accurate method of analysis pitch. One example of this method
may be automatic clustering documents. [23]
Machine learning with a teacher. The process of creating the tone analysis system is
very similar to the process of creating other systems using machine learning [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]
 It needs to assemble a collection of documents for training the classifier;
 Each document from the training necessary to present the collection in the form of
a feature vector;
 For each document, needs to specify the «correct answer», tone type (e.g. positive
or negative), and for those responses to be trained classifier;
 Classification algorithm selection and training of the classifier;
 Use the resulting model;
The number of classes that are shared key is usually set of system specifications. For
example, the customer is required for the system to distinguish three kinds of tone:
«positive», «neutral», «negative». The studies usually consider the problem of binary
classification key, i.e., only two classes: the «positive» and «negative».[10]
Classification tone for more than two classes - this is a very difficult task. Even with
the three classes is very difficult to achieve good accuracy regardless of the approach
used. The most interesting method is a method based on dictionaries. [19]
2.1
      </p>
      <sec id="sec-2-1">
        <title>Approaches based on dictionaries. SentiStrenght software</title>
        <p>According to research published in the article Sentiment strength detection in short
informal text. Journal of the American Society for Information Science and
Technology [26]. It was tested various algorithms for determining the strength of positive
mood for 1041 comments with an extended set of functions and 10-fold
crossvalidation (in descending order of indicators of the strength of positive mood). [20]
Except for SentiStrength, the results are the average of 4 runs of different random test
/ training sections and for an optimal number of features. The results are shown in
Table 1.</p>
      </sec>
      <sec id="sec-2-2">
        <title>Algoritm Characters Accuracy Corr.</title>
        <p>In terms of the power of negative emotions, most methods give very similar results,
and some give better results than SentiStrength. Although the SentiStrength accuracy
is 72.8%, this is only 2.9% better than the baseline, some other methods have similar
accuracy levels, and SVM is significantly more accurate. SentiStrength is the most</p>
      </sec>
      <sec id="sec-2-3">
        <title>SentiStrength</title>
        <p>(standard configuration,
30 runs)
Simple logistic
regression
SVM (SMO)
J48 classification tree
JRip rule-based
classifier
SVM regression (SMO)
AdaBoost
Decision table
Multilayer Perceptron
Naive Bayes
Baseline
Random</p>
        <p>700
800
700
700
100
100
200
100
100
60.6%
58.5%
57.6%
55.2%
54.3%
54.1%
53.3%
53.3%
50.0%
49.1%
47.3%
19.8%</p>
        <p>Accuracy
+/- 1
class
96.9%
96.1%
95.4%
95.9%
96.4%
97.3%
97.5%
96.7%
94.1%
91.4%
94.0%
56.9%
.599
.557
.538
.548
.476
.469
.464
.431
.422
.567</p>
        <p>.016</p>
        <p>Abs
mean %
22.0%
23.2%
24.4%
24.7%
28.2%
28.2%
28.5%
28.2%
30.2%
27.5%
31.2%
82.5%
accurate of the methods when one class error is allowed and has the highest
correlation with human coding results. In theory, none of the methods should be worse than
the baseline, but this can be due to the optimization of the training set, not the
estimate set. Overall, it might seem that SentiStrength is not very good at recognizing
negative emotions, but this is a difficult task for the short texts analyzed here. Also,
the average percent absolute error for the random category is over 100% due to the
predominance of "1" as the correct category for negative sentiment. In this way
systems based on the vocabulary approach are no worse than systems using machine
learning, and sometimes even better, it depends on the specific task.
2.2</p>
      </sec>
      <sec id="sec-2-4">
        <title>Machine learning with a teacher. Common rules</title>
        <p>This is the most common method. Its essence is to train a classification algorithm
based on a collection of documents. The classes of which are known in advance. [22]
Advantages:
 High accuracy in determining the tonality;
 The problem of dependence on a specific subject area is solved by training the
classifier based on a sample from this area, since the classifier itself selects features
that affect the sentiment, many studies are carried out in order to improve
accuracy.</p>
        <p>Disadvantages:
 A marked-up collection of texts is needed (markup is a very time-consuming
process).</p>
        <p>The algorithm of this approach can be described as follows:
 First of all, it needs to choose a collection of documents based on which the
classifier will be trained;
 Each document must be presented as a vector of features (aspects);
 Further, each document must be assigned the correct type of sentiment;
 It is necessary to choose a classification algorithm and method for training the
classifier;
 Application of the resulting model.</p>
        <p>It is necessary to decide how many classes and what type of classification will be
used. It is difficult to get high results when using flat classification. Research shows
that the best results are obtained using hierarchical classification. All documents from
the training set must be n-dimensional vectors of aspects. [21] The quality of the
results directly depends on which set of characteristics will be used. The most common
ways of presenting documents are bag-of-words or n-grams form. [13]
There are two classification algorithms naive Bayes classifier and support vector
machine (SVM). After choosing the classification algorithm and training the classifier,
the results are assessed with cross-validation process help. [12]
Formula for finding accuracy:
(1)
In the case of cross-validation, the data is split into k parts, then the model is trained
on k-1 parts of the data, and the rest is used for testing.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Review of existing systems, determining the tone of the text</title>
      <p>IBM Watson Explorer is a set of models which allow one to look up information in a
text, to select entities and relationships.</p>
      <p>Baidu ERNIE 2.0 - a framework for understanding natural language is available in
English and Chinese languages, supports the inference definition of semantic
similarity, recognition of named entities, the key of the analysis and comparison of the
questions and answers.</p>
      <p>Apple platform for natural language processing - a framework for language
identification, tokenization, lemmatization, parts of speech tagging and identification of named
objects.</p>
      <p>Facebook bAbI - a platform for the automatic interpretation of texts, as well as a set
of datasets to test algorithms for natural language understanding.</p>
      <p>Facebook FastText - framework for the classification of text, highlight key words and
named entities.</p>
      <p>Tencent NLP - an open platform with features semantic analysis that provides an API
for the development of NLP-systems solutions and applications of natural language
processing.</p>
      <p>AliReader - technology analysis of unstructured text, intelligent search and retrieve
information from a variety of documents used in many products Alibaba.
Russian companies are leading the development in the field of NLP, on the market in
several categories. First, it is the search engines and the company, which for many
years engaged in text technology «Yandex», ABBYY, Mail.ru, PROMT and RCO
(part of the group Rambler).</p>
      <p>The second category - large corporations, which only in the last 3-4 years began to
form their competence in the field of AI. For example, Sberbank, «Tinkoff Bank»
MTS. All of them have achieved impressive results, even though do basically the
technology for internal use.
"Yandeks.Toloka"- crowdsourcing platform for the collection and processing of
data for the ML-projects, training of search algorithms and neural networks, the
development of speech technologies and computer vision. The "Cleanup" there are more
than 5 million 20 thousand, and performers. Customers. Collected assessment are
used to develop voice assistants and chat bots and research in different domains.
"Sberbank"- monitoring and automatic content analysis of news about the 1000
partner banks in Russian. NLP-decision ABBYY selects a meaningful message,
categorizes news on various risk factors and collects relevant data dossiers of the banks.
Just AI Conversational Platform- enterprise-level platform for the development of
conversational chat-bots and assistants who understand natural language. Chat bots
started in the platform, to solve complex business challenges: customer support,
recruitment and training of staff, ordering and selling of goods.</p>
      <p>PROMT Analyzer SDK- a component for information analysis systems. Allows you
to automatically analyze Big Data in different languages, highlights the fact,
mentioned persons, organizations, events, and other entities, to determine the tone of the
statements and documents.</p>
      <p>EUREKA ENGINE (BRAND ANALYTICS)- High system of linguistic analysis
module type text that allows to extract new knowledge and facts from unstructured
data in huge volumes of real-time.</p>
      <p>RCO Fact Extractor SDK- a tool of the computer analysis of textual information.
The package is designed for developers of information-analytical and search engines.
RCO Text Categorization Engine- Developer library for information retrieval
systems, allowing based on lexical profiles define the text belonging to a given set of
categories to get the number of entries and the position of the selected term in the text.
Project iPavlov- overcoming technological barriers in the field of meaningful
human-machine communication in natural language through the creation and
introduction to business practice tools, reducing the threshold of entry into the market of text
dialogue systems. The goal is realized through the following tasks: research and
development of network architectures for working with text in a natural language.
Creating views analysis system is a challenging task, but doable if there is data for
training and pre-defined theme. [14] When using machine learning is important to test
different options to pick the ones that work best on the test data. It needs to test
different algorithms for classification (NB, SVM), a set of features (unigrams, bigrams,
the character N-gram), signs the weighing function. There are many ways to improve
the classification key, such as the use of tonal dictionaries, additional linguistic
features (eg, parts of speech), and general methods for the improvement of machine
learning (boosting, Bagging and others.). [16]</p>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <p>We have decided to compare two methods. It is approaches based on dictionaries and
machine learning with a teacher. For machine learning we applied an algorithm:
 First of all, it needs to choose a collection of documents based on which the
classifier will be trained;
 Each document must be presented as a vector of features (aspects);
 Further, each document must be assigned the correct type of sentiment;
 It is necessary to choose a classification algorithm and method for training the
classifier;
 Application of the resulting model.</p>
      <p>The participles of working program are:
 Lemmatization of a document - bringing all words in a document to their initial
form using the pymorphy2 morphological analyzer, removing punctuation marks,
service parts of speech, words that contain letters of the Latin alphabet.
 Allocation of text features - the document is compared with 3 numbers, which are a
numerical characteristic of its emotional coloring, calculated using the TF-IDF
formula. Weights are calculated for using unigrams (document words one at a
time), bigrams (a combination of two words) and trigrams (a combination of 3
words).
 Determining the document class ("positive", "negative") using the Naive Bayesian
classifier.</p>
      <p>For approaches based on dictionaries we have followed algorithm:
 Data cleaning. All text is scanned and extra characters are removed;
 All words are reduced to their initial form, using OpenCorpora;
 After using tonal dictionaries as well as dictionaries expressions and idioms, words
are weighted;
 In addition to calculating negative, neutral and positive tonalities, emotions are
calculated using OCC model;
 After determining the key and calculating the values 16
 emotions, the program evaluates the tonality of the text. The calculation of
emotions is one of the important processes, because their identification improves the
operation of the sentiment determination algorithm, and at the same time checks it
for errors.</p>
      <p>A dataset was selected for testing our programs. There were 30000 tweets that had
been determined. Program based on machine learning with a teacher made a mistake
32% of the time. Program based on dictionaries approaches made a mistake 25% of
the time. In this way approach based on dictionaries works better than approach based
on machine learning with a teacher.</p>
    </sec>
    <sec id="sec-5">
      <title>Discussion and further directions</title>
      <p>Sentiment analysis is a field of computational linguistics devoted to the automatic
identification of assessments, emotions of a person regarding entities in the text, or
the identification of a general assessment of the emotionality of a statement.
Tonality is the emotional color expressed in the text. This analysis is used both for
commercial purposes and for solving scientific research problems.</p>
      <p>There are two types of opinions - simple and comparative. Most of the works devoted
to this area of research are engaged in identifying simple opinions, since comparative
ones are extremely difficult to analyze.</p>
      <p>Most often, there are 6 tasks of analyzing the emotional coloring of the text - the
extraction of objects, aspects, the author and their classification; extraction of time and
its standardization; determination of tonality; opinion conclusion.</p>
      <p>The main problems that arise when analyzing sentiment include the dependence of
sentiment on the subject area, the use of emotive vocabulary in neutral sentences,
sarcasm, the dependence of sentiment on the user who reads the message, as well as
the expression of sentiment without using emotionally colored words. These problems
can be eliminated with varying degrees of success in the process of sentiment
analysis.</p>
      <p>There are four main approaches to the analysis of the sentiment of texts:
 method using rules;
 method using a dictionary of emotional vocabulary;
 method based on teaching with a teacher,
 method based on unsupervised learning.</p>
      <p>Each of the approaches has advantages and disadvantages.</p>
      <p>The supervised learning method was discussed in detail, as it is one of the most
popular approaches to sentiment analysis. This approach has five main steps. At the first
stage, a training sample is prepared. Each document is represented as a vector of
features (most often it is either a "bag of words" or n-grams) with further assignment of
weights to each element of the vector. Then a classification algorithm is selected. In
this chapter, we looked at how the two most efficient algorithms work the naive
Bayesian classifier and the support vector machine. To evaluate the performance of
the model, either completeness and accuracy can be calculated or cross-validated.
Method using a dictionary of emotional vocabulary is also popular. For example
SentiStrength software. It was developed by Mike Telwall, Kevan Buckley and their
colleagues at the University of Wolverhampton in 2010. The program evaluates the
sentiment strength of short messages simultaneously on two scales (positive and
negative) from 1 to 5 and from -1 to -5. This system is based on the use of emotional
vocabulary and corrective rules. The program was developed based on messages from
the MySpace social network. The dictionary was partially supplemented with
vocabulary from the LIWC dictionary. The chapter describes in detail the process of creation
and the algorithm of SentiStrength.</p>
      <p>The algorithm of this tool is more effective than other methods in dealing with the
analysis of the positive sentiment of short informal texts. The result of the program
was evaluated using the accuracy and the correlation coefficient between the expert
estimates and the program estimates. Later, the developers made several attempts to
improve the program for negative sentiment by expanding the original data.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>Thus, we can conclude that all methods of determining the tonality are good. At the
moment, there is no universal solution and everything depends on the task and needs
of the customer. For example, the rule-based approach gives excellent results, but it is
limited in application, as well as in the selected domain (topic). A sharp change of
domain strongly affects the definition of the sentiment, and it does not work very well
when there is an intersection of completely different topics. Machine learning
approaches are also not perfect, but they are popular due to the technological order and
development information society. There are a lot of adds in these approaches that, if
applied correctly, it gives good results. Dictionary approaches also have their
drawbacks and show good results in some tasks, and not very well in others. It is happened
due to vocabulary approaches are not universal, but these approaches, combined with
others, such as machine learning, may well provide a very accurate definition of
sentiment, regardless of the subject matter and data structure.
10. Das, S., &amp; Chen, M. (2001). Yahoo! for Amazon: Extracting market sentiment from
stock message boards. Proceedings of the Asia Pacific Finance Association Annual
Conference (APFA), Bangkok, Thailand, July 22-25, last accessed 20.02.2020:
http://sentiment.technicalanalysis.org.uk/DaCh.pdf.
11. Derks, D., Bos, A. E. R., &amp; von Grumbkow, J. (2008). Emoticons and online
message interpretation. Social Science Computer Review, 26(3), pp. 379-388.
12. Fox, E. (2008). Emotion science. Basingstoke: Palgrave Macmillan, p. 127.
13. Freitas A.A., de Carvalho A.C.P.L.F. (2007) Research and trends in data mining
technologies and applications: tutorial on hierarchical classification with applications
in bioinformatics.
14. Fullwood, C., &amp; Martino, O. I. (2007). Emoticons and impression formation. The</p>
      <p>Visual in Popular Culture, 19(7), pp. 4-14.
15. Gamon, M., Aue, A., Corston-Oliver, S., &amp; Ringger, E. (2005). Pulse: Mining
customer opinions from free text (IDA 2005). Lecture Notes in Computer Science, 3646,
pp. 121-132.
16. Ghazi Diman, Inkpen Diana, Szpakowicz Stan. Hierarchical versus Flat
Classification of Emotions in Text. Proceedings of the NAACL HLT 2010 Workshop on
Computational Approaches to Analysis and Generation of Emotion in Text, pp. 140-146,
Los Angeles, California, June 2010.
17. Joshi Mahesh, Dipanjan Das, Kevin Gimpel, and Noah A. Smith. Movie reviews and
revenues: An experiment in text regression. In Proceedings of the North American
Chapter of the Association for Computational Linguistics Human Language
Technologies Conference (NAACL 2010), 2010.
18. Jurafsky Daniel, Martin James H. Speech and Language Processing. An Introduction
to Natural Language Processing, Computational Linguistics, and Speech Recognition.</p>
      <p>Second Edition. Pearson Education International, 2009. 1024 pp.
19. Kan D. Rule-based approach to sentiment analysis. Sentiment Analysis Track at</p>
      <p>ROMIP, 2011.
20. Krippendorff, K. (2004). Content analysis: An introduction to its methodology.</p>
      <p>Thousand Oaks, CA: Sage.
21. Kukich, K. (1992). Techniques for automatically correcting words in text. ACM
computing surveys, 24(4), pp. 377-439.
22. Liu Bing. Sentiment Analysis and Opinion Mining. Morgan &amp; Claypool Publishers,</p>
      <p>May 2012.
23. Liu Bing. Sentiment Analysis Tutorial. AAAI-2011, San Francisco, USA.
24. Liu Yang, Huang Xiangji, An Aijun, Yu Xiaohui: ARSA: a sentiment-aware model
for predicting sales performance using blogs. SIGIR 2007: pp. 607-614.
25. Research by Frost &amp; Sullivan in the interests of Up Great Technology Competitions
(organized by RVC, ASI and the Skolkovo Foundation).
26. Sentiment Strength Detection in Short Informal Text. Mike Thelwall, Kevan Buckley,
Georgios Paltoglou, Di Cai. Statistical Cybermetrics Research Group, School of
Computing and Information Technology, University of Wolverhampton, Wulfruna
Street, Wolverhampton WV1 1SB, UK.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Asur</given-names>
            <surname>Sitaram</surname>
          </string-name>
          and
          <article-title>Bernardo A. Huberman. Predicting the future with social media</article-title>
          .
          <source>Arxiv preprint arXiv: 1003.5699</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Babbar</given-names>
            <surname>Rohit</surname>
          </string-name>
          , Partalas Ioannis, Gaussier Eric, Amini Massih-Reza.
          <article-title>On Flat versus Hierarchical Classification in Large-Scale Taxonomies</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Baccianella</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Esuli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Sebastiani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining</article-title>
          .
          <source>Proceedings of the Seventh conference on International Language Resources and Evaluation</source>
          , pp.
          <fpage>2200</fpage>
          -
          <lpage>2204</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. boyd, d. (
          <year>2008</year>
          ).
          <article-title>Taken out of context: American teen sociality in networked publics</article-title>
          . University of California, Berkeley, Berkeley.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.boyd, d. (
          <year>2008</year>
          ).
          <article-title>Why youth (heart) social network sites: The role of networked publics in teenage social life</article-title>
          . In D. Buckingham (Ed.), Youth, identity, and digital media, pp.
          <fpage>119</fpage>
          -
          <lpage>142</lpage>
          . Cambridge, MA: MIT Press.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Bradley</surname>
            ,
            <given-names>M. M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Lang</surname>
            ,
            <given-names>P. J.</given-names>
          </string-name>
          (
          <year>1999</year>
          ).
          <article-title>Affective Norms for English Words (ANEW): Stimuli, instruction manual, and affective ratings (</article-title>
          <source>Tech. Report C-1)</source>
          . Gainesville: University of Florida, Center for Research in Psychophysiology.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Brill</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          (
          <year>1992</year>
          ).
          <article-title>A simple rule-based part of speech tagger</article-title>
          .
          <source>Proceedings of the Third Conference on Applied Natural Language Processing</source>
          , pp.
          <fpage>152</fpage>
          -
          <lpage>155</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Cha</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haddadi</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Benevenuto</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Gummadi</surname>
            ,
            <given-names>K.P. Measuring</given-names>
          </string-name>
          <article-title>User Influence in Twitter: The Million Follower Fallacy</article-title>
          .
          <source>Proceedings of the 4th International AAAI Conference on Weblogs and Social Media (ICWSM)</source>
          , Washington, May
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Choi</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Cardie</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2008</year>
          ).
          <article-title>Learning with compositional semantics as structural inference for subsentential sentiment analysis</article-title>
          .
          <source>Proceedings of the Conference on Empirical Methods in Natural Language Processing</source>
          , pp.
          <fpage>793</fpage>
          -
          <lpage>801</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>