<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Sentiment Evaluation: User, Business Assessment and Hashtag Analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Chetan Jha</string-name>
          <email>chetan.jha3@mail.dcu.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ray Walshe</string-name>
          <email>ray.walshe@dcu.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dublin City University</institution>
          ,
          <addr-line>Dublin</addr-line>
          ,
          <country country="IE">Ireland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Social media has become an important platform for the public to express their opinions and comment on all manner of things. Analysis of social media data can yield interesting facts and views about another person, product, business or an issue. This paper focuses on content profiling based on sentiment analysis of Twitter data for a particular User, Business or a Hashtag by focusing on the emotions, reactions and opinions written on Twitter by different users in the form of tweets and using statistical, learning and natural language processing techniques.. To accomplish the task, this paper will also use various Machine Learning techniques along with a combination of Python Natural Language Toolkit (NLTK) VADER Library. The different machine learning classifiers are evaluated and results show that Artificial Neural Networks perform the best with 76.46% accuracy, Random Forest second most accurate classifier with 75.95% accuracy and Multinomial Naïve Bayes achieved 75.65% accuracy. The methods described here provide users with a robust and flexible way of profiling Twitter users using sentiment extracted from tweet data.</p>
      </abstract>
      <kwd-group>
        <kwd>Sentiment Analysis</kwd>
        <kwd>User profiling</kwd>
        <kwd>Classification algorithms</kwd>
        <kwd>Artificial Neural Networks</kwd>
        <kwd>Convolutional Neural Networks</kwd>
        <kwd>Multinomial Naïve Bayes</kwd>
        <kwd>RandomForest</kwd>
        <kwd>K-Nearest Neighbour</kwd>
        <kwd>Support Vector Machine</kwd>
        <kwd>NLTK VADER</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Data used for this paper is basically in the form of tweets. Due to the restricted length of 140 characters, people use
acronyms, emoticons, abbreviations and slang words. Tweets also contain Hashtags which are used to mark topics or subjects
starting with ‘#’ symbol. People refer to another Twitter user using ‘@’ symbol along with the username. Tweets can
also contain URLs inserted by user linking to an image, document or website.
3.1</p>
      <sec id="sec-1-1">
        <title>Training dataset</title>
        <p>
          The training data used in this paper is the Sanders dataset [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. Niek Sanders provided the Twitter sentiment corpus
publically on October 24, 2011. This dataset consists of 5513 tweets which have been manually annotated by the annotator as
positive, negative, neutral or irrelevant. These tweets are technology based tweets for four companies, i.e. Apple, Google,
Microsoft and Twitter. Out of these tweets, there are 570 positive, 654 negative, 2503 neutral and 1786 irrelevant tweets.
        </p>
        <p>The dataset contains tweet id and its corresponding sentiment. Approximately 4364 tweets were downloaded, of which
there were 428 positive, 475 negative, 2003 neutral and 1458 irrelevant tweets.
3.2</p>
      </sec>
      <sec id="sec-1-2">
        <title>Test dataset</title>
        <p>This real-time Twitter data was acquired and stored in the database for the purpose of research this paper. This data is
downloaded for few different kinds of Twitter users which are related to technology, political and sports. This data ranges
from 20,000 to 60,000 tweets per user.</p>
        <p>
          The system designed and presented in this paper will run for any Twitter user, and the data fetched from Twitter in
realtime as per the limitations [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] applied by Twitter for search which is 180 tweets for a user-based authentication and 450
for an app-based authentication. The Twitter search API also restricts the search against a sampling of recent tweets
published in past seven days.
4
4.1
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Tools and Applications</title>
      <sec id="sec-2-1">
        <title>Django Web Framework</title>
        <p>
          Django [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] is a high-level Python Web Framework which is free and open-source. It follows a Model-View-Template
(MVT) architecture and contains a set of components like user authentication, management panel, forms. These features
3
help alleviate some of the overhead while building a new site. Django also provides access to Python’s subject related
libraries like Tweepy, Scikit-learn, Pymongo.
        </p>
        <p>
          This framework is required to implement the client interaction server for allowing the user to use the project being
developed for this paper. For enhancing the user interface of the project, Metronic Theme [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] was used.
4.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Scikit-learn</title>
        <p>
          Scikit-learn [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] is a Python [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] based machine learning library which provides a range of supervised and unsupervised
learning algorithms. Scikit-learn is built upon Scipy (Scientific python) and includes some of the most important Python
libraries for data science like NumPy, SciPy, Matplotlib, IPython, SymPy and Pandas. Scikit-learn is focused on data
modelling which include clustering, cross-validation, dimensionality reduction, feature extension and selection.
Keras [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] is a Python based open source, high-level neural network API. It is capable of running machine learning models
like convolutional and recurrent neural networks using TensorFlow [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ], CNTK [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] or Theano [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] on CPU as well as
GPU. It makes implementation of libraries like TensorFlow, CNTK and Theano easy and user friendly.
TensorFlow [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] is an open-sourced library developed by Google to meet their needs of building and training neural
networks which is user-friendly, easy to implement [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. TensorFlow is more popular in comparison to Theano and Torch
[
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. It supports a large community and also has a very interactive visualisation dashboard called TensorBoard.
Tweepy [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] is an open-sourced library which enables Python to communicate with the Twitter platform and use its
RESTful API. Tweepy has a Cursor object which helps to paginate the downloading and iterating of tweets. It can recursively
download the number of tweets requested by the user while maintaining the limitations [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] applied by Twitter. Tweepy
was used to fetch tweets for training database and is also responsible for fetching the tweets at runtime.
4.3
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Keras 4.4</title>
      </sec>
      <sec id="sec-2-4">
        <title>TensorFlow 4.5</title>
      </sec>
      <sec id="sec-2-5">
        <title>Tweepy 4.6</title>
      </sec>
      <sec id="sec-2-6">
        <title>MongoDB</title>
        <p>
          MongoDB [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] is an open-source database that uses document-oriented data model. It is a NoSQL [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] database which is
highly scalable and performance efficient. Instead of having rows and columns which are a part of SQL databases,
MongoDB is based on collections and documents. Each document comprises of key-value pairs which are the fundamental unit
of data in MongoDB. Like other NoSQL databases, MongoDB also has a dynamic schema which allows documents to
have different structures and fields. MongoDB performs faster than other relational databases like Oracle [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ] and MySQL
[
          <xref ref-type="bibr" rid="ref26">26</xref>
          ] and PyMongo [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ] provides an easy and recommended way to work with MongoDB from Python.
4.7
        </p>
      </sec>
      <sec id="sec-2-7">
        <title>NLTK VADER</title>
        <p>
          VADER [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ] is an abbreviation for Valence Aware Dictionary for sentiment Reasoning which was developed in 2014 [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ].
It is an open-sourced, lexicon and rule-based sentiment analysis tool which is specifically attuned to sentiments expressed
in social media and also works well on other domains. VADER can include sentiments from Emoticons [ :-) ], Sentiment
related acronyms [ LOL ] and Slang [ Meh ]
        </p>
        <p>When a piece of text is passed to VADER, it returns result in terms of polarity –
1. Negative polarity – Negativity score of the text.
2. Neutral polarity – Neutrality score of the text.
3. Positive polarity – Positivity score of the text.
4
4. Compound polarity – This is the overall polarity score of the text. It will be negative if the text is largely negative
and positive if the text is overall positive. It will be zero if the text is neither negative nor positive and will mean
the text is neutral.
5
5.1</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>System Design and Models</title>
      <sec id="sec-3-1">
        <title>Pre-processing</title>
        <p>
          The training data as well the testing data undergoes the following pre-processing steps to bring into focus the words which
contain some sentiment. The pre-processing helps to reduce the feature space and also increase the accuracy of the Machine
learning algorithm. Pre-processing involved the following steps:
1. Removal of : URLs , repetition of characters, numbers, stop words like “the”, “to”, “from”
2. Changing text to lower-case
3. Remove of user mentions, “#” symbol. For example – “#great” to “great” etc.
4. Stemming – Stemming is the process of reducing the word to its root. Stemming improves the system effectiveness
and results in higher accuracy of predictions [
          <xref ref-type="bibr" rid="ref33">33</xref>
          ].
        </p>
        <p>
          5. Dictionary checking implemented using PyEnchant [
          <xref ref-type="bibr" rid="ref34">34</xref>
          ], which is a spell-checking library for python
5.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Sentiment Analysis</title>
        <p>
          Feature Extraction and Selection. Feature extraction [
          <xref ref-type="bibr" rid="ref35">35</xref>
          ] is the process of transforming arbitrary data, such as text or
image, into numerical features usable for machine learning. When dealing with a large set of data major problem arises
from the number of variables involved. Analysis comprising of large number of variables requires significant amount of
computational power and memory. It can also cause the machine learning algorithm to over-fit the training data and cause
poor predictive performance with new data. So, feature extraction is a method of creating a combination of input variables
to avoid such problems and to accurately predict the results.
        </p>
        <p>
          Scikit learn provides two suitable feature extraction classes:
CountVectorizer. CountVectorizer [
          <xref ref-type="bibr" rid="ref36">36</xref>
          ] converts a collection of text documents into a matrix of token counts. Each text is
separated into tokens and the number of times each token occurs is counted.
        </p>
        <p>
          TfidfVectorizer. TfidfVectorizer [
          <xref ref-type="bibr" rid="ref37">37</xref>
          ] is similar to the CountVectorizer as it also creates document term matrix, but instead
of filling it with word count it calculates term frequency-inverse document frequency value for each word.
        </p>
        <p>or
where,
 −  =   ∗</p>
        <p>1
 
 −  =   ∗   
term frequency = number of occurrences of the word in a document
inverse document frequency = inverse of the number of occurrences of the word in all the documents
Machine Learning Classifiers. Machine learning approach to classify the tweets as per the sentiment involves training
set to develop a sentiment classifier that classifies sentiments of tweets.</p>
        <p>
          The subsequent sections will explore some of the most commonly [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] [
          <xref ref-type="bibr" rid="ref38">38</xref>
          ] used machine learning classifiers for text
classification.
        </p>
        <p>
          Multinomial Naïve Bayes (MNB). Naïve Bayes classifier is based on Bayes theorem and assumes that the classes for
classification are independent. Despite this assumption usually being false, analysis has shown there are some theoretical
reasons for the high efficiency of Naïve Bayes classifiers [
          <xref ref-type="bibr" rid="ref39">39</xref>
          ]. Though the probability estimates of Naïve Bayes are not
good but the classification decisions are quite good [
          <xref ref-type="bibr" rid="ref40">40</xref>
          ]. Naïve Bayes classifier is also recommended in the presence of
inadequate computational power and memory capacity.
        </p>
        <p>
          The Multinomial Naïve Bayes [
          <xref ref-type="bibr" rid="ref41">41</xref>
          ] classifier is a specialised version of Naïve Bayes which is designed for the
classification of data with discrete features. For example, word counts for text classification. MNB requires numerical feature
counts as input.
        </p>
        <p>
          Random Forest (RF). Random Forest [
          <xref ref-type="bibr" rid="ref42">42</xref>
          ] is an ensemble algorithm which means a combination of more than one same or
different kind of algorithms for classifying objects. Random forest operates by constructing a large number of decision
trees at training time and outputs the mode of classes classified by individual trees. By randomly selecting the trees, their
correlation is reduced and prediction power is increased. The Random Forest supports parallelisation and concurrency of
different trees. If there is a small size of data with a large number of trees, then Random Forest can over-fit the data. The
Random Forest has been found to provide good accuracy in classification of datasets for opinion mining [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
K-Nearest Neighbour (KNN). K-Nearest Neighbour [
          <xref ref-type="bibr" rid="ref43">43</xref>
          ] classifier is one of the simplest classification algorithms and is a
non-parametric lazy learning algorithm which means that it does not make any assumptions on the underlying data
distribution. Due to its lazy nature, its training phase is fast, and it does not use any generalisation of training data which means
it requires training data during the testing phase. K-Nearest Neighbour assumes data points are in a metric space and
requires it to be a scalar or multidimensional vector.
        </p>
        <p>
          Support Vector Machine (SVM). Support Vector Machine [
          <xref ref-type="bibr" rid="ref44">44</xref>
          ] classifier can be used for both classification or regression
problems. Support vector machine classifies data by finding the best hyperplane that separates all the data points of the
particular classes. Support Vector Machine works for classification with exactly two classes but can be used for
classification by reducing multi-class data to a binary problem by choosing random partitions of the set of classes, recursively. This
increases the time taken but also enhances the accuracy of learning.
Artificial Neural Network (ANN). Artificial Neural Networks [
          <xref ref-type="bibr" rid="ref45">45</xref>
          ] are computational algorithms intended to simulate the
behaviour of biological systems composed of Neurons. They are constructed as a system of interconnected neurons which
can compute values from inputs provided.
        </p>
        <p>Neurons have multiple inputs, does some processing on received inputs and give the output. Neurons are organised in
the form of layers and hidden layers. So, the input values to a neuron are received either from the input layer, if it is the
first layer of neurons or the values are received from neurons of the previous layer. The connection between inputs and
neurons is called Synapse, and by adjusting the weights, it can be decided how much input signal should be passed to the
neuron.</p>
        <p>=   ∗ ℎ</p>
        <p>Inside neurons, an activation function is applied to the summation of all weighted inputs. The output value of a neuron
can be a continuous value (for example prices), Binary value or Categorical value.</p>
        <p>
          The difference in output and the actual value is used to calculate the cost function C. Then the weights are adjusted to
check the cost function once again. It keeps on repeating until the cost function is minimized.
7
Convolutional Neural Network (CNN). Convolutional Neural Networks [
          <xref ref-type="bibr" rid="ref46">46</xref>
          ] [
          <xref ref-type="bibr" rid="ref47">47</xref>
          ] [
          <xref ref-type="bibr" rid="ref48">48</xref>
          ] [49] is responsible for breakthroughs
in Image Classification and are the core part of most self-driving cars and image detection systems, for example,
Facebook’s automated photo tagging. CNN contains several layers of convolutions with non-linear activation functions like
Rectifier function or Hyperbolic Tangent function applied to the results. In CNN, each input neuron is not connected to
each output neuron in the next layer like traditional feedforward neural network. Instead, convolutions are used over the
input layer to compute the output which creates local connections as all regions of input are connected to a neuron in the
output. Each layer applies multiple filters and combines their results.
        </p>
        <p>For Text Classification, input to the classifier is sentences or documents represented as a matrix. In this matrix, each
row corresponds to one token (word). In image processing, the filters slide over local patches of an image, while in Natural
Language Processing the filters slide over full rows of the matrix, which corresponds to a word. For example, see Figure
4, which explains the process of CNN.</p>
        <p>User Interaction Portal. A user portal was developed using Django framework to allow users to interact with the project.
A user can input either one or two Twitter usernames to profile and compare them. When usernames are provided, the
system will respond to the request with the following results:
1. User Details: Name- Location- Profile picture- Profile background image- Description- Followers count - Tweets
count
2. Tweets: Most recent 200 tweets by user - Most recent 200 tweets for user (@mention) - Most popular 200 tweets for
the user (@mention)
3. Sentiment Profiling:
a. Timeline – Number of positive, negative and neutral tweets for the Twitter user: - Past 24hr -Past week - Past
month -Past year - All time
b. Sentiment Reach – Number of people to whom the sentiments reached, i.e. the spread of positive, negative
or neutral influence.
c. Regional Stats – It was planned and proposed but was not feasible due to the limited location based data. For
example, only 43 out of 33356 tweets contained location data. User location was tried as a fall back
mechanism, but user location contained incorrect values.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments and Results</title>
      <p>For the purpose of this paper, as shown in Figure 2, annotated data by Sanders was downloaded, cleaned and pre-processed.
This data was divided into 80:20 ratio for training and testing respectively. Different feature extractors and selectors were
used to create sparse matrix of words. NLTK VADER results were obtained for the tweets and the positive and negative
polarity of the tweet was inserted in two new columns in the previously created sparse matrix. This data was used to train
the machine learning algorithms. Results were predicted for the test data and accuracy was calculated using 10-fold cross
validation, wherein the original sample is divided into 10 samples, out of which 9 were used for training and one was used
for validation. This process is repeated 10 times and therefore, each sample is used as a validation sample exactly once.
The result is averaged to calculate the prediction accuracy. Table 1, shows the results of the accuracy of various classifiers
where columns denote the classifiers used and the rows signify the type of methods used for enhancement before training
the classifiers. The row termed as “basic” does not utilise the enhancement methods mentioned in other rows of the table.</p>
      <p>Also, when NLTK VADER was used separately to predict the sentiment on the test data, based on its polarity, it gives
an accuracy of 55.80% in comparison to human sentiment analysis.</p>
      <p>When n-grams were tried, the bigrams and trigrams increased the feature space multi-fold for example, the data set
contained 5091 unigrams, 18286 bigrams and 22,306 trigrams. When unigrams, bigrams and trigrams are used
simultaneously they result in feature space of 45683 words. N-grams, in turn, increased the time required to train and predict a model
for classifiers like Random Forests and K-Nearest Neighbour, along with significantly increased memory consumption of
some classifiers like Convolutional Neural Network (up to around 22GB) which was beyond the scope of the system setup
used. Due to increase in the demand of resources, and the fact that there was only a marginal increase in accuracy by 0.01%
for Multinomial Naïve Bayes, N-grams were not used.
7</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>Twitter is one of the most popular social networks where users can tweet about different topics. The tweets have a
maximum size of 140-characters which imposes a significant challenge in predicting sentiments of that text. In this paper, an
evaluation method is designed where different machine learning classifiers are evaluated after changes like using different
feature extractors, data preprocessing, combining NLTK VADER polarities.</p>
      <p>Table 1, shows the results where Artificial Neural Network seems to perform the best as compared to other classifiers
in the given conditions which resulted in 76.46% accuracy. On the other hand, stemming process using snowball stemmer
on an average provided the best increase in the accuracy of classifiers, followed by inclusion of VADER polarity which
handles emoticons, slangs and acronyms automatically.</p>
      <p>Random Forest proved to be the second most accurate classifier with 75.95% accuracy. However, Multinomial Naïve
Bayes along with being fastest and most light weight on resources achieved 75.65% accuracy, almost achieving RF
accuracy.
11</p>
      <p>The methods described here provide users with a robust and flexible way of profiling Twitter users, for example,
comparing two hotels, two companies, political parties or products by providing their Twitter usernames.
8</p>
    </sec>
    <sec id="sec-6">
      <title>Future work</title>
      <p>For future work, further research is necessary to improve the profiling of Twitter users, increase user confidence and
provide more meaningful insights about the reason regarding positive or negative sentiments. Some of these may include:
1. Bias removal - While profiling a Twitter user, for example, any product or company, research is required to find
a way of fetching a list of employees of the company or its subsidiary companies who may be tweeting to induce
a positive bias. Such list can be fetched from LinkedIn, but research needs to be done on finding a way to correlate
the LinkedIn users with Twitter users.
2. Researching ways of predicting the location of users based on their friends or content of their tweets.
3. Researching to analyse and categorise the users based on their likes and favourites for predicting which kind of
users are more likely to tweet positive or negative about a user or business.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Twitter</surname>
          </string-name>
          , "Company | About,
          <article-title>"</article-title>
          [Online]. Available: https://about.twitter.com/company. [Accessed 19 June 2017].
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. www.theglobeandmail.com,
          <article-title>"No strings, please: Expedia Canada ad falls flat - The Globe and Mail,"</article-title>
          [Online]. https://www.theglobeandmail.
          <article-title>com/report-on-business/no-strings-please-expedia-ad-falls-flat/article16476388/</article-title>
          .
          <source>[Accessed 23 July</source>
          <year>2017</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>MonkeyLearn</given-names>
            <surname>Blog</surname>
          </string-name>
          ,
          <article-title>"Donald Trump vs Hillary Clinton: sentiment analysis on Twitter mentions | MonkeyLearn Blog,"</article-title>
          [Online]. https://monkeylearn.com/blog/donald
          <article-title>-trump-vs-hillary-clinton-sentiment-analysis-twitter-mentions/</article-title>
          .
          <source>[Accessed 22 July</source>
          <year>2017</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. SAS, [Online]. http://blogs.sas.com/content/sgf/files/2016/09/13550_
          <string-name>
            <surname>Grover-Analytics-Conference-E_Poster-Final-</surname>
          </string-name>
          Sid-Grover.
          <source>pdf. [Accessed 23 July</source>
          <year>2017</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Coursera</surname>
          </string-name>
          ,
          <article-title>"5.2 Explanations of sentiment analysis with unsupervised learning</article-title>
          - Yonsei University | Coursera,
          <article-title>"</article-title>
          [Online]. Available: https://www.coursera.org/learn/text-mining-analytics/lecture/x2oe6/5-2
          <article-title>-explanations-of-sentiment-analysis-with-unsupervised-learning</article-title>
          .
          <source>[Accessed 25 July</source>
          <year>2017</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Coursera</surname>
          </string-name>
          ,
          <article-title>"5.1 Explanations of sentiment analysis with supervised learning</article-title>
          - Yonsei University | Coursera,"
          <issue>25</issue>
          <year>July 2017</year>
          . [Online]. Available: https://www.coursera.org/learn/text-mining-analytics/lecture/hbTb7/5-1
          <article-title>-explanations-of-sentiment-analysis-with-supervised-learning.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>D.</given-names>
            <surname>Earth</surname>
          </string-name>
          ,
          <article-title>"Dialogue Earth Pulse Weather Sentiment,"</article-title>
          [Online]. http://www.dialogueearth.org/pulse/weather/.
          <source>[Accessed 26 May</source>
          <year>2017</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>V.</given-names>
            <surname>Joshi</surname>
          </string-name>
          and
          <string-name>
            <given-names>V.</given-names>
            <surname>Vekariya</surname>
          </string-name>
          ,
          <article-title>"A Comparative Study on Classification Algorithms for Sentiment Analysis,"</article-title>
          <source>International Journal for Scientific Research &amp; Development</source>
          , vol.
          <volume>4</volume>
          , no.
          <issue>10</issue>
          , pp.
          <fpage>428</fpage>
          -
          <lpage>430</lpage>
          ,
          <year>October 2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>A.</given-names>
            <surname>Gupte</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Gadgul</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Kadam</surname>
          </string-name>
          ,
          <article-title>"Comparative Study of Classification Algorithms used in Sentiment Analysis,"</article-title>
          <source>International Journal of Computer Science and Information Technologies</source>
          , vol.
          <volume>5</volume>
          , no.
          <issue>5</issue>
          , pp.
          <fpage>6261</fpage>
          -
          <lpage>6264</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>N.</given-names>
            <surname>Sanders</surname>
          </string-name>
          ,
          <article-title>"Sanders Analytics - Twitter Sentiment Corpus,"</article-title>
          [Online]. http://www.sananalytics.com/lab/twitter-sentiment/.
          <source>[Accessed 23 May</source>
          <year>2017</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Twitter</surname>
          </string-name>
          ,
          <article-title>"Rate Limits: Chart - Twitter Developers,"</article-title>
          [Online]. https://dev.twitter.com/rest/public/rate-limits.
          <source>[Accessed 27 May</source>
          <year>2017</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Django</surname>
          </string-name>
          <article-title>Software Foundation, "The Web framework for perfectionists with deadlines | Django," Django Software Foundation</article-title>
          , [Online]. https://www.djangoproject.
          <source>com. [Accessed 23 May</source>
          <year>2017</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. keenthemes.com, "Metronic | #1
          <string-name>
            <given-names>Selling</given-names>
            <surname>Ultimate Bootstrap Admin Dashboard Theme</surname>
          </string-name>
          ,
          <article-title>" keenthemes</article-title>
          .com, [Online]. Available: http://keenthemes.com/preview/metronic/.
          <source>[Accessed 28 May</source>
          <year>2017</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Python</surname>
          </string-name>
          .org,
          <article-title>"scikit-learn: machine learning in</article-title>
          <source>Python - scikit-learn 0.19</source>
          .
          <article-title>0 documentation," Python</article-title>
          .org, [Online]. Available: http://scikitlearn.org/stable/index.html.
          <source>[Accessed 02 June</source>
          <year>2017</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>O</given-names>
            <surname>'Reilly Media</surname>
          </string-name>
          , Inc.,
          <article-title>"Data scientists and data engineers like Python</article-title>
          and
          <string-name>
            <surname>Scala - O'Reilly Media</surname>
          </string-name>
          ,
          <article-title>" https://www.oreilly.com/ideas/data-scientistsand-data-engineers-like-python-and-scala</article-title>
          .
          <source>[Accessed 22 May</source>
          <year>2017</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <article-title>IPython development team, "The Jupyter Notebook - IPython,"</article-title>
          [Online].https://ipython.org/notebook.html.
          <source>[Accessed 27 May</source>
          <year>2017</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17. keras.io,
          <article-title>"Keras Documentation,"</article-title>
          [Online]. Available: https://keras.io.
          <source>[Accessed 14 July</source>
          <year>2017</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>Google</given-names>
            <surname>Inc</surname>
          </string-name>
          .,
          <article-title>"GitHub - tensorflow/tensorflow: Computation using data flow graphs for scalable machine learning,"</article-title>
          [Online]. Available: https://github.com/tensorflow/tensorflow. [
          <source>Accessed 16 July</source>
          <year>2017</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Microsoft</surname>
          </string-name>
          ,
          <article-title>"The Microsoft Cognitive Toolkit,"</article-title>
          <string-name>
            <surname>Microsoft</surname>
          </string-name>
          , [Online]. Available: https://docs.microsoft.com/en-us/cognitive-toolkit/.
          <source>[Accessed 30 July</source>
          <year>2017</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Theano Development</surname>
            <given-names>Team</given-names>
          </string-name>
          ,
          <article-title>"</article-title>
          <source>Welcome - Theano 0.9</source>
          .0 documentation," [Online]. Available: http://www.deeplearning.net/software/theano/.
          <source>[Accessed 30 July</source>
          <year>2017</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21. Stanford University,
          <article-title>"Lecture note 1: Introduction to TensorFlow,"</article-title>
          [Online]. Available: http://web.stanford.edu/class/cs20si/lectures/notes_01.pdf.
          <source>[Accessed 02 Aug</source>
          <year>2017</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Tweepy</surname>
          </string-name>
          .org, Tweepy.org, [Online]. Available: http://www.tweepy.
          <source>org. [Accessed 29 May</source>
          <year>2017</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23. MongoDB, Inc.,
          <article-title>"MongoDB for GIANT Ideas | MongoDB,"</article-title>
          https://www.mongodb.
          <source>com. [Accessed 03 July</source>
          <year>2017</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24. MongoDB, Inc.,
          <article-title>"NoSQL Databases Explained | MongoDB,"</article-title>
          [Online]. Available: https://www.mongodb.com/nosql-explained.
          <source>[Accessed 05 July</source>
          <year>2017</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <given-names>A.</given-names>
            <surname>Boicea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Radulescu</surname>
          </string-name>
          and
          <string-name>
            <surname>L. I. Agapin</surname>
          </string-name>
          ,
          <article-title>"MongoDB vs Oracle -</article-title>
          -
          <source>Database Comparison," IEEE, 20 November</source>
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>C. Győrödi</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Győrödi</surname>
            and
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Pecherle</surname>
          </string-name>
          ,
          <article-title>"A comparative study: MongoDB vs</article-title>
          .
          <source>MySQL," IEEE, 16 July</source>
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <article-title>MongoDB, Inc, "Python Driver (PyMongo) - Getting Started With MongoDB 3.0.4,"</article-title>
          [Online]. Available: https://docs.mongodb.com/gettingstarted/python/client/.
          <source>[Accessed 05 July</source>
          <year>2017</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28. nltk.org,
          <article-title>"nltk</article-title>
          .
          <source>sentiment.vader - NLTK 3.2</source>
          .4 documentation," [Online]. Available: http://www.nltk.org/_modules/nltk/sentiment/vader.html.
          <source>[Accessed 18 June</source>
          <year>2017</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <given-names>C.</given-names>
            <surname>Hutto</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Gilbert</surname>
          </string-name>
          ,
          <article-title>"VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text,"</article-title>
          [Online]. http://comp.social.gatech.edu/papers/icwsm14.vader.
          <source>hutto.pdf. [Accessed 22 July</source>
          <year>2017</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Amazon Web</surname>
            <given-names>Services,</given-names>
          </string-name>
          "Amazon Web
          <string-name>
            <surname>Services (AWS) - Cloud Computing Services</surname>
          </string-name>
          ,
          <article-title>" Amazon</article-title>
          .com, [Online]. Available: https://aws.amazon.
          <source>com. [Accessed 22 June</source>
          <year>2017</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <surname>Amazon Web</surname>
            <given-names>Services</given-names>
          </string-name>
          ,
          <article-title>"AWS Free tier," Amazon</article-title>
          .com, [Online]. Available: https://aws.amazon.com/free/.
          <source>[Accessed 09 June</source>
          <year>2017</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32.
          <string-name>
            <given-names>K.</given-names>
            <surname>Ganesan</surname>
          </string-name>
          ,
          <article-title>"Text Mining, Analytics &amp; More: All About Stop Words for Text Mining and Information Retrieval,"</article-title>
          [Online]. Available: http://textanalytics101.rxnlp.com/
          <year>2014</year>
          /10/all-about
          <article-title>-stop-words-for-text-mining</article-title>
          .
          <source>html. [Accessed 07 July</source>
          <year>2017</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          33.
          <string-name>
            <given-names>N.</given-names>
            <surname>Kaur</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Kakkar</surname>
          </string-name>
          ,
          <article-title>"A balanced sentiment analysis approach with stemming porter for neutralized emotion weightage,"</article-title>
          <source>International Journal of Advanced Research in Computer and Communication Engineering</source>
          , vol.
          <volume>4</volume>
          , no.
          <issue>10</issue>
          , pp.
          <fpage>466</fpage>
          -
          <lpage>467</lpage>
          ,
          <year>October 2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          34.
          <article-title>scikit-learn.org, "4.2</article-title>
          .
          <string-name>
            <surname>Feature</surname>
          </string-name>
          extraction - scikit
          <source>-learn 0.19</source>
          .
          <article-title>0 documentation,"</article-title>
          [Online]. Available: http://scikit-learn.org/stable/modules/feature_extraction.html#feature-extraction.
          <source>[Accessed 15 June</source>
          <year>2017</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          35.
          <article-title>scikit-learn.org, "sklearn.feature_extraction.text</article-title>
          .
          <source>CountVectorizer - scikit-learn 0.19</source>
          .0 documentation," http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.
          <source>CountVectorizer.html. [Accessed 17 June</source>
          <year>2017</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          36.
          <article-title>scikit-learn.org, "sklearn.feature_extraction.text</article-title>
          .
          <source>TfidfVectorizer - scikit-learn 0.19</source>
          .
          <article-title>0 documentation,"</article-title>
          [Online]. Available: http://scikitlearn.org/stable/modules/generated/sklearn.feature_extraction.text.
          <source>TfidfVectorizer.html. [Accessed 17 June</source>
          <year>2017</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          37.
          <string-name>
            <surname>H. M. Ismail</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Harous</surname>
            and
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Belkhouche</surname>
          </string-name>
          ,
          <article-title>"A Comparative Analysis of Machine Learning Classifiers for Twitter Sentiment Analysis,"</article-title>
          [Online]. Available: http://www.rcs.cic.ipn.mx/
          <year>2016</year>
          _
          <article-title>110/A Comparative Analysis of Machine Learning Classifiers for Twitte Sentiment Analysis</article-title>
          .
          <source>pdf. [Accessed 12 July</source>
          <year>2017</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          38.
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <article-title>"The Optimality of Naive Bayes,"</article-title>
          [Online]. Available: http://www.cs.unb.ca/~hzhang/publications/FLAIRS04ZhangH.pdf.
          <source>[Accessed 29 June</source>
          <year>2017</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          39. The Stanford Natural Language Processing Group ,
          <article-title>"Properties of Naive Bayes,"</article-title>
          [Online]. Available: https://nlp.stanford.edu/IRbook/html/htmledition/properties-of
          <article-title>-naive-bayes-1</article-title>
          .html.
          <source>[Accessed 27 June</source>
          <year>2017</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          40.
          <article-title>scikit-learn.org, "sklearn.naive_bayes</article-title>
          .
          <source>MultinomialNB - scikit-learn 0.19</source>
          .0 documentation," http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.
          <source>MultinomialNB.html. [Accessed 01 July</source>
          <year>2017</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          41. scikit-learn.
          <source>org, "3.2.4.3.1. sklearn.ensemble.RandomForestClassifier - scikit-learn 0.19</source>
          .0 documentation," http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.
          <source>RandomForestClassifier.html. [Accessed 03 July</source>
          <year>2017</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          42.
          <article-title>scikit-learn.org, "sklearn</article-title>
          .
          <source>neighbors.KNeighborsClassifier - scikit-learn 0.19</source>
          .
          <article-title>0 documentation,"</article-title>
          [Online]. Available: http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.
          <source>KNeighborsClassifier.html. [Accessed 05 July</source>
          <year>2017</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          43.
          <article-title>scikit-learn.org, "1.4</article-title>
          .
          <string-name>
            <given-names>Support</given-names>
            <surname>Vector Machines -</surname>
          </string-name>
          scikit
          <source>-learn 0.19</source>
          .
          <article-title>0 documentation,"</article-title>
          [Online]. Available: http://scikit-learn.org/stable/modules/svm.html.
          <source>[Accessed 07 July</source>
          <year>2017</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          44. Department of Psychology, University of Toronto,
          <article-title>"</article-title>
          <source>What are Artificial Neural Networks,"</source>
          [Online]. Available: http://www.psych.utoronto.ca/users/reingold/courses/ai/cache/neural2.html.
          <source>[Accessed 16 July</source>
          <year>2017</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          45. Stanford Engineering, Computer Science,
          <article-title>"CS231n Convolutional Neural Networks for Visual Recognition,"</article-title>
          [Online]. Available: http://cs231n.github.io/convolutional-networks/.
          <source>[Accessed 27 July</source>
          <year>2017</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          46.
          <string-name>
            <surname>Stanford</surname>
          </string-name>
          ,
          <article-title>"UFLDL Tutorial,"</article-title>
          [Online]. Available: http://ufldl.stanford.edu/tutorial/supervised/ConvolutionalNeuralNetwork/.
          <source>[Accessed 28 July</source>
          <year>2017</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          47.
          <string-name>
            <given-names>D.</given-names>
            <surname>Britz</surname>
          </string-name>
          ,
          <article-title>"Understanding Convolutional Neural Networks for NLP,"</article-title>
          http://www.wildml.com/
          <year>2015</year>
          /11/understanding-convolutional
          <article-title>-neural-networks-for-nlp/</article-title>
          .
          <source>[Accessed 27 July</source>
          <year>2017</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          48. cambridgespark.com,
          <article-title>"Deep learning for complete beginners: convolutional neural networks with keras," https://cambridgespark</article-title>
          .com/content/tutorials/convolutional
          <article-title>-neural-networks-with-keras/index</article-title>
          .html.
          <source>[Accessed 29 July</source>
          <year>2017</year>
          ].
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>