<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Sentiments and Opinions From Twitter About Peruvian Touristic Places Using Correspondence Analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Luis Cajachahua</string-name>
          <email>lcajachahua@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Figure 1: Tourist arrivals to Perú in the last 10</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Indira Burga</string-name>
          <email>indira.burga@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universidad Nacional de Ingeniería</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universidad de Ingeniería y Tecnología</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>years (Promperú</institution>
          ,
          <addr-line>2016).</addr-line>
        </aff>
      </contrib-group>
      <fpage>178</fpage>
      <lpage>189</lpage>
      <abstract>
        <p>Tourism in Perú has become very important, since there is a growing number of tourists arriving each year. This paper focus in understand what do speakingenglish tourists have in consideration when they visit Perú. We obtained all the tweets published in english during the year 2016, filtered by touristic places visited. In total, more than 192 thousand tweets were collected. We performed different analysis to describe the data, including correspondence analysis, a statistical technique which is normally applied to categorical data. The goal was to understand the sentiments and opinions expressed in those tweets.</p>
      </abstract>
      <kwd-group>
        <kwd>Twitter</kwd>
        <kwd>Tourism</kwd>
        <kwd>Sentiment Analysis</kwd>
        <kwd>Text Mining</kwd>
        <kwd>Correspondence Analysis</kwd>
        <kwd>Perú</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Tourism is an important source of economic
growth in Perú. In 2015, 3.5 million of tourists
visited Perú, leaving more than USD 4,151
million that correspond to the 3.75% of the gross
domestic product (
        <xref ref-type="bibr" rid="ref1">Promperú, 2016</xref>
        ). For this reason,
all the government tourism bureaus, specially
Promperú, are working hard to keep those
numbers growing on Figure 1.
      </p>
      <p>In this context, every initiative oriented to
understand the tourist preferences and sentiments is
very valuable, because it allows to discover and
develop the main attributes of touristic places in
Perú, considering not only the place itself, but all
the ecosystem: hotels, touristic agencies,
handicraft stores, transportation, public services, etc.</p>
      <p>
        For that reason, we downloaded 192,525
tweets, published during 2016, considering only
english language. After a careful data preparation
and feature extraction, we applied some sentiment
analysis techniques to tag each tweet using the
following eight sentiments: anger, fear, sadness,
disgust, surprise, anticipation, trust and joy,
proposed by Plutchik
        <xref ref-type="bibr" rid="ref2">(Mohammad and Turney,
2010)</xref>
        . Using correspondence analysis, we
discovered some associations between touristic places
visited and sentiments expressed in the extracted
tweets.
      </p>
      <p>In addition, we used other text mining
techniques to determine some general concepts
mentioned on tweets and identify the association
between the places visited and these concepts.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
      <p>
        Text mining has become a useful tool to deal
with the data deluge. With few and simple tools,
we can now extract valuable insights from a big
amount of unstructured data available today in the
internet. For example, one of the first studies of
sentiment analysis in Twitter was developed in
2011 by Agarwal et al. They built a formula to
calculate the polarity of a comment, if the tweet is
positive or negative. They developed a tree kernel
approach to improve the classification
        <xref ref-type="bibr" rid="ref3">(Agarwal,
et al., 2011)</xref>
        .
      </p>
      <p>
        In the same year, Dodds et al analyzed a huge
list of tweets, around 4.6 billion, posted by over
63 million individual users and extracted in a
33month span
        <xref ref-type="bibr" rid="ref4">(Dodds, et al., 2011)</xref>
        . The main goal
was to develop a dictionary optimized to measure
the level of happiness of a given text, e.g. another
tweet. They named their solution as “The
Hedometer”
One of the first studies about tourism using
tweets was developed by Antoniadis. This study
was focused in verifying the correlation between
Twitter performance and tourism competitive
index for 38 destination management organizations
(DMO’s). They found that Twitter use was in
accordance with countries’ tourism performance
        <xref ref-type="bibr" rid="ref5">(Antoniadis, et al., 2014)</xref>
        .
      </p>
      <p>
        Oku et al, analyzed 4.5 million tweets to find
the location where a tourist posted a photo. They
used only geotagging information from the
pictures, leaving text data out of the analysis
        <xref ref-type="bibr" rid="ref6">(Oku, et
al., 2015)</xref>
        .
      </p>
      <p>
        In addition, Bassolas performed an interesting
analysis about the 20 most interesting touristic
places around the world, including Machu Picchu.
They tried to measure the “attractiveness” of each
place, considering two metrics: a) Radius, defined
as the average distance between the places of
residence and the touristic site and b) Coverage, the
area covered by the users’ places of residence
computed as the number of distinct zones (or
countries) of residence
        <xref ref-type="bibr" rid="ref7">(Bassolas, et al., 2016)</xref>
        .
      </p>
      <p>All these studies were considered to develop a
different approach, focused on finding the
emotions expressed by the tourists in the visited
places.
2.1</p>
    </sec>
    <sec id="sec-3">
      <title>Objectives</title>
      <p>This research was focused in five goals:</p>
      <p>Understand the sentiment expressed in the
tweet (positive or negative) about the place.
Tag each tweet using the sentiments that the
user is expressing in the tweet
Identify the main qualitative attributes
mentioned in the tweets, related to the touristic
place.</p>
      <p>Analyze the possible relationships between
the places and the identified attributes.</p>
      <p>Explore the general activity of the tourists
which use twitter (frequent, eventual).</p>
      <p>Having those objectives in mind, we
determined the limits of our efforts, given the nature
of the information source and the tools used.
2.2</p>
    </sec>
    <sec id="sec-4">
      <title>Limitations</title>
      <p>There are various limitations to be considered:
Being Twitter an internet social network, all
the users must be registered and logged to
use it. But not every person or tourist in the
world has a Twitter account. For this reason,
results are not generalizable/representative
for the tourists’ population.</p>
      <p>We acquired all the tweets directly from
Twitter. We didn’t used streaming or
scraping methods to download the tweets. But,
we used many filters to select tweets
referred to touristic places. In consequence,
we could have selected a non-representative
sample of tweets.</p>
      <p>We considered only original posted tweets.
We are not considering retweets. But, there
are many platforms that post news or copies
of tweets, e.g. IFTTT. Therefore, we could
find the exact same tweet published by
many different users.</p>
      <p>Some Tweets (10%) include the location of
the user but this information is not always
real which makes classification by origin
difficult. Besides, the georeferenced
information is not representative.
3</p>
    </sec>
    <sec id="sec-5">
      <title>Methodology</title>
      <p>Having reviewed the references and settled the
scope of the study, we selected some tools and
techniques available to analyze the information.
3.1</p>
    </sec>
    <sec id="sec-6">
      <title>Scope</title>
      <p>The population considered for this study was
formed by 192,525 tweets downloaded directly
from Twitter. The distribution of tweets is shown
in the Figure 2.
3.2</p>
    </sec>
    <sec id="sec-7">
      <title>Text Mining</title>
      <p>
        Text Mining is the methodological process of
information extraction, where researchers deal
with collections of documents, using specialized
analysis methods
        <xref ref-type="bibr" rid="ref14">(Balbi, et al., 2012)</xref>
        .
      </p>
      <p>The complete process we have followed is
shown in Figure 3. We collected tweets from 12
touristic places from Perú, that were suggested by
Promperú, using words related to those places. We
used Python with GNIP PowerTrack® API to
retrieve and download the information.</p>
      <p>The sintax used to obtain tweets was: (lang:en)
(machupicchu OR picchu OR sacsayhuaman OR
camino inca OR cusco OR cuzco OR plaza Armas
cusco OR plaza armas cuzco OR ollataytambo
OR salinas maras OR qorikancha OR coricancha
OR ccoricancha OR pisac OR aguas calientes OR
moray OR valle sagrado OR nasca lineas OR
nazca lineas OR paracas OR islas ballestas OR
manglares tumbes OR puerto pizarro OR parque
reserva lima OR larcomar OR plaza armas lima
OR parque kennedy OR catedral lima OR
pachacamac OR monasterio santa catalina OR cañon
colca OR colca OR uros OR taquile OR sillustani
OR cathedral cusco OR cathedral cuzco OR main
square lima OR nazca lines OR nasca lines OR
ballestas island OR main square cusco OR square
kennedy OR santa catalina monastery OR colca
canyon OR sacred valley OR inca trail OR inKa
trail OR cordillera blanca OR huaraz OR
llanganuco OR amazon river peru OR mancora OR
vichayito OR madre dios OR manu park OR iquitos
OR tarapoto OR kuelap OR moche OR titicaca
OR lake 69 OR laguna 69 OR señor sipan OR
lord sipan OR pullana OR misti volcan OR misti
volcano OR yanahuara OR huascaran OR churup
OR chavin OR mochica OR pacaya samiria OR
tucume OR cumbemayo OR baños inca OR wari
OR punta sal OR amotape OR catacaos OR
bahuaja sonene OR cotahuasi OR tingo maria)</p>
      <p>During initial pre-processing of the datasets, all
tweets where labelled with the cities and touristic
places, by matching words related to the places.
For example, if the text contained the words
“Sacsayhuaman” and “Chan-Chan”, we put 1 in
the columns “Cusco” and “La Libertad”. It means,
the tourist is talking about attractions in two
different regions. And 0 in all other columns,
corresponding to other Peruvian regions.</p>
      <p>Not every tweet is clear and correctly written,
we had to perform a cleaning process, considering
the next steps:
a) Drop Retweets, Hashtags and mentions.
b) Drop strange symbols
c) Drop webpages
d) Drop punctuation signs
e) Drop numbers
f) Drop tabs or multiple spaces</p>
      <p>
        We frequently found spelling mistakes in the
tweets. For that reason, we used a spelling
corrector. E.g. we replaced ‘speling’ by ‘spelling’ with
an algorithm in Python
        <xref ref-type="bibr" rid="ref19">(Dean and Bill 2007)</xref>
        , for
the Stemming we used SAS Text Miner and
NLTK
        <xref ref-type="bibr" rid="ref18">(Bird, et al., 2009)</xref>
        . Finally, we removed
the most common stopwords such as ‘a’, ‘the’,
‘an’, etc. from the datasets.
      </p>
      <p>After the cleaning process, we tried to
understand the dataset, analyzing the comments, using
different R packages (qdap, koRpus, tm). SAS
Text Miner was helpful for Topic Modeling and
Term Relationship diagrams. Finally, R was used
for Sentiment Classification (library syuzhet) and
Correspondence Analysis (function corresp).</p>
      <p>
        We have pre-defined some “concepts”, using
the Text Profile Node of SAS Text Mining
        <xref ref-type="bibr" rid="ref13">(SAS
Institute Inc. 2013)</xref>
        . It performs a Hierarchical
Bayesian Model which predict the concepts
associated to each touristic place. We have grouped
some terms in concepts, after lemmatization,
associated to positive and negative characteristics of
the attractions visited.
      </p>
      <p>In our case, we also identified, all these words,
considering verbs, to represent the activities that
tourists do in the places and adjectives, to
represent the opinions or value perceived by the tourist.</p>
      <sec id="sec-7-1">
        <title>Characterization</title>
      </sec>
      <sec id="sec-7-2">
        <title>Tagging and Classification Pre-processing</title>
        <p>- Wordclouds
- Correspondence Analysis
- Perceptual Maps
- Term relationships
- Topic Modeling
- Tagging/Classification
- Ortographical Review
- Terms filtering
- Cleaning
- Download tweets
- Format
- Preparation</p>
      </sec>
      <sec id="sec-7-3">
        <title>Tweets collect</title>
        <p>
          The first sentiment analysis performed with text
data, were published around 2002. As Agarwal
stated, most of the studies were focused in text
classification, positive or negative categories
          <xref ref-type="bibr" rid="ref3">(Agarwal, et al., 2011)</xref>
          .
        </p>
        <p>
          One of the first teams performing sentiment
analysis with Twitter information, was leaded by
O’Connor. They also considered only the
classification of 1 billion Twitter messages, but looking
the changes over time
          <xref ref-type="bibr" rid="ref8">(O’Connor, et al., 2010)</xref>
          .
        </p>
        <p>
          Mohammad developed a historic review of
Sentiment Analysis, but considering Emotions
          <xref ref-type="bibr" rid="ref2">(Mohammad, et al., 2010)</xref>
          . They consider the
Plutchik classification of emotions (Figure 4).
Plutchik stated that emotions, like in color theory,
can produce others, when they are combined. This
allows to calculate scores for a given text, trying
to classify it in any of those combinations.
        </p>
        <p>
          In fact, some tools include a process which use
these concepts to classify texts, for example the
Library Syuzhet in R. This library calls some
others, to have a great performance when scoring
sentiments and calculating polarity. With this
library, we can use the function get_sentiment with
two arguments (text and method, the default is
‘syuzhet’) in order to calculate the score for the
text in each one of the eight main sentiment
categories listed in Figure 4. The greatest score
indicates the more prevalence of the sentiment. The
default method uses a sentiment lexicon,
developed in the Nebraska Literary Lab under the
direction of Matthew L. Jockers
          <xref ref-type="bibr" rid="ref10">(Jockers, 2015)</xref>
          .
        </p>
        <p>
          Maybe, the most representative researcher in
Correspondence Analysis made by Benzécri. He
has not invented this technique, but consolidated
the methods involved (
          <xref ref-type="bibr" rid="ref11">Benzécri, 1973</xref>
          ). He was
working since early 60’s in Categorical Data
Analysis and Mathematic Linguistics in French.
        </p>
        <p>Correspondence Analysis was created as a
qualitative statistical dimension reduction
technique. Is very useful for everyone to understand
the numeric associations between two categorical
factors. After performing a matrix decomposition
of the analyzed crosstab, you can represent the
original frequencies in a bi-dimensional space.</p>
        <p>As stated in the book of Venables and Ripley,
after applying the method Singular Value
Decomposition (SVD) is easy to calculate the row and
column Scores.</p>
        <p>Suppose we have an r x c table N of counts.
Correspondence analysis seeks ‘scores’ f and g for
the rows and columns which are maximally
correlated. Clearly the maximum correlation is one,
attained by constant scores, so we seek the largest
non-trivial solution. Let R and C be matrices of
the group indicators of the rows and columns, so
RTC = N. Consider the singular value
decomposition (SVD) of their correlation matrix:
   =
  ⁄ −(  .⁄ )( . ⁄ )
√(  .⁄ )( . ⁄ )
=   −     (1)
 √</p>
        <p>
          Where   =   .⁄ and   =  . ⁄ are the
proportions in each row and column. Let Dr and Dc
be the diagonal matrices of r and c.
Correspondence Analysis corresponds to selecting the first
singular value and the left and right singular
vectors of Xij and rescaling by Dr-1/2 and Dc-1/2,
respectively
          <xref ref-type="bibr" rid="ref12">(Venables and Ripley, 2002)</xref>
          .
        </p>
        <p>In our case, see Table 1 and 2, the first input
matrix is:</p>
        <p>anger anticipationdisgust fear joy sadness surprise trust
MachuPicchu 7,679 31,455 6,596 11,297 21,302 10,791 17,428 22,911
CaminoInca 799 4,763 351 1,290 2,992 988 1,879 3,164
ValleSagrado 461 2,153 331 733 2,095 529 962 1,830
AguasCalientes 114 1,269 57 104 253 74 1,022 278
Sacsayhuaman 55 780 44 136 708 99 749 943
L.Titicaca 1,521 2,889 1,434 3,399 2,553 2,290 2,339 2,547
Table 1: Distribution of tweets by touristic places
and sentiments expressed.</p>
        <p>The other matrix is:
amazing beautiful best excellent good love bad old
MachuPicchu 6,421 2,037 2,521 1,154 1,701 1,482 244 1,880
CaminoInca 695 153 493 368 405 152 97 275
ValleSagrado 464 361 143 111 109 106 8 335
AguasCalientes 48 20 73 10 24 12 6 4
Sacsayhuaman 47 10 10 15 5 11 3 58
L.Titicaca 334 230 66 64 127 127 32 177
Table 2: Distribution of tweets by touristic places
and concepts.
3.5</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Correspondence Analysis Plot</title>
      <p>After applying SVD, in each matrix, we get the
row and columns scores. We can use them to
generate the Correspondence Plot. It could be done
putting in a scatterplot the scores calculated for
each matrix.</p>
      <p>
        Some researchers have used this kind of charts
in text analysis before, but having different
approaches. For example, Balbi et al developed two
studies
        <xref ref-type="bibr" rid="ref14 ref15">(Balbi, et al.,2012; Balbi, et al.,2013)</xref>
        ,
drawing plots to model language. In addition, they
analyzed relationships between terms, to
understand the meaning of the texts, shown Figure 5.
      </p>
      <p>Interestingly they used first all the verbs and
adjectives of the analyzed texts to find those
words called “Concepts”, which express key
characteristics for them.
4</p>
    </sec>
    <sec id="sec-9">
      <title>Results</title>
      <p>Having downloaded more than 500 thousand
tweets, pre-filtering touristic places and content,
we needed to filter the data more carefully,
because it contained a lot of tweets talking about
other subjects. In consequence, we discarded:
a) Retweets
b) Tweets without text, only pictures
c) Tweets in other languages.</p>
      <p>After this process, we got 192,525 tweets, but
published by only 70,161 unique users. Even
twitter is a popular social network, users have
different patterns of activity. We considered “Active
users” those who published at least 12 tweets in all
the span analyzed, from January to December
2016. At least one tweet per month, in average.
We discovered that only 2.5% of the total were
active users shown in Figure 6.</p>
      <p>192,525</p>
      <p>70,161</p>
      <sec id="sec-9-1">
        <title>Tweets</title>
      </sec>
      <sec id="sec-9-2">
        <title>Unique Users</title>
      </sec>
      <sec id="sec-9-3">
        <title>Active Users</title>
        <p>After that, we identified the most active users,
in order to discover specialized users, bots (robots
or automatic processes that publish tweets) and
other kind of strange or interesting patterns. We
found in the top some touristic companies, users
who love to travel, accounts for travelers, etc.
Shown in Table 3.</p>
        <p>N</p>
        <p>User</p>
        <p>Kondor Path Tours (@kondorpathtours) is a
travel agency, very active in Twitter, publishing
five times tweets more than the next one in the
list. L. J. Blake (@lookeastwest) is a travel
blogger, who travels around the world and post photos,
places reviews and advices to other travelers.</p>
        <p>After performing the twitter cleaning process,
explained in section 3.2, we obtained the most
frequent words in the corpus. In English tweets,
Machu Picchu is most popular than Perú or any
other Peruvian place or word. And it appears on
the first places in the list again as “Machupicchu”
in Table 4.</p>
        <p>If we use wordclouds, the visualization is not
very helpful, because the most frequent words
hide all the other ones in Figure 7.</p>
        <p>We filtered some words to see the picture most
clear, in our case we decided filter machu picchu,
machupicchu, peru, cusco, travel, inca and trail in
Figure 8.</p>
        <p>
          Attitude Valuation: After cleaning processes,
we used syuzhet package, to classify each tweet in
Positive or Negative, depending on the polarity
expressed. This method is not 100% accurate, but
it gives a general idea. Other limitation is that
tweets have only 140 characters, a limited way to
express complex feelings
          <xref ref-type="bibr" rid="ref10">(Jockers, 2015)</xref>
          .
12%
28%
Puno
(11,289)
49%
        </p>
        <p>Ica
(8,442)</p>
        <p>62%
16%
22%
32%
23%
18%
11%
6%
Negativa Neutral Positiva</p>
        <p>Negativa Neutral Positiva
Negativa Neutral Positiva</p>
        <p>Negativa Neutral Positiva
Negativa Neutral Positiva</p>
        <p>Negativa Neutral Positiva</p>
        <p>We analyzed the opinions of all tweets related
to touristic places in Perú, and after that, each
region was considered in the analysis separately,
with the count of positive, negative and neutral
tags. According to the general results, Cusco is the
region with the highest percentage of positive
tweets and Lima is leading the group of neutral
opinions shown in Figure 9.</p>
        <p>We could see that, in general, there are many
tweets classified as neutral. It is common in text
classification, due to the nature of social media.
The most frequent case is a user talking about
news or things that are happening. For example:
“I’m in Cusco”, or “The truth about Paracas
Skulls”. In our context, trying to understand the
attitude of the tourist, we could simplify the
classification, from three levels to two. Only negative
opinions or non-negative opinions, grouping
positive and neutral ones.</p>
        <p>Puno has the highest rate of negative tweets,
because in early October, ten thousand of giant
toads were found dead in the Titicaca Lake.
According to Journalists, pollution in the Coata River
would be blame for the deaths. It triggered a lot of
claims and campaigns in social networks. Some
protesting activists took around 100 dead frogs to
Cusco
(160,193)</p>
        <p>
          55%
the central square in the regional capital, Puno. A
BBC article describing the disaster was shared a
lot of times in Twitter
          <xref ref-type="bibr" rid="ref16">(BBC News. 2016)</xref>
          .
        </p>
        <p>Touristic Places: Considering the most visited
places mentioned by Promperú, we looked for
negative and non-negative opinions in Table 5.</p>
        <p>The place with more negative tweets is Titicaca
Lake. The places with better performance are
Valle Sagrado, Aguas Calientes, and Caminos del
Inca, in Cusco. Machu Picchu has a good
performance too, but some incidents affected the
reputation of them in Twitter.</p>
        <p>Parque Kennedy and Larcomar, in Lima, have
good image, too. The district of Miraflores is very
touristic. They realized the great potential of
tourism in Perú and many good hotels and restaurants
are in the zone.</p>
        <p>
          Concept Identification: Using Hierarchical
Bayesian Methods, with the option ‘Text Rules’ in
SAS Enterprise Miner, we identified the words
which most appeared in negative tweets. It
considers rules based on words co-occurrence to
perform a simple classification
          <xref ref-type="bibr" rid="ref13">(SAS Institute Inc.
2013)</xref>
          .
        </p>
        <p>We focused the analysis in each touristic place
analyzed, in order to find specific words
associated to the place. Some examples are shown in
Figure 10.</p>
        <p>Those words are meaningful, because reflect
some incidents occurred in Machu Picchu. A
German tourist died in October, while trying take
a picture of himself. He fell off a cliff in the
mountains. In addition, we found tweets about
“andean astronomers” or “shamans” who perform
a ritual known as “Ayahuasca”, which consists in
drinking an Amazonian plant mixture, that is
capable of inducing altered states of consciousness,
usually lasting between 4 to 8 hours after
ingestion. They could be related to bad or strange
experiences, at least.</p>
        <p>In the other direction, we also selected other
words that appeared in non-negative tweets. Most
them are related to describe good experiences
with the landscapes, trekking routes and people in
Figure 11.</p>
        <p>In general, tourists enjoy ‘beautiful’ natural
landscapes, ‘spectacular’ ancient buildings, ruins,
food and services, ‘colorful’ handicrafts and
traditional clothes. They liked to go hiking across the
valley, too. All those words were collected to do
the next task, which is grouping most relevant
expressions in concepts, using co-ocurrence and
synonyms.</p>
        <p>
          Relationships between terms: To have a
better understanding of the words and concepts
discovered in the previous step, we performed a Link
Analysis, between terms A and B, using a single
metric, called Strength
          <xref ref-type="bibr" rid="ref13">(SAS Institute Inc. 2013)</xref>
          .
ℎ =  (1⁄
        </p>
        <p>=
 = ∑ = 
( )~</p>
        <p>( )
( ,  )
 ) (2)
(3)
(4)
n: Number of documents that contain term B
k: Number of documents containing A and B
Then, we can build some graphs, to represent
the strongest relationships:
3,368
3,266</p>
        <p>3,501
joy anticipation anger disgust sadness surprise fear trust
825
371</p>
        <p>1,028</p>
        <p>Lago Titicaca
1,584
1,514
2,368
2,487
1,982
1,370
3,988
joy anticipation anger disgust sadness surprise fear trust
509</p>
        <p>Parque Kennedy
about the negative experiences, the thickness of
the lines represents the degree of relationship. We
can build the same graphs for Non-Negative terms
to find things that tourists enjoy or like to do,
shown in Figure 13.</p>
        <p>
          Emotions discovery: The next step is dealing
with sentiments. In order to classify all the tweets,
we use the syuzhet package
          <xref ref-type="bibr" rid="ref10">(Jockers, 2015)</xref>
          again
to identify the eight main emotions. After that, we
have plot the emotions distribution for each place
visited. Some of them are shown in Figure 14.
38,217
        </p>
        <p>Machu Picchu
25,165
26,437
8,015
6,737</p>
        <p>18,655
11,630
12,439
joy anticipation anger disgust sadness surprise fear trust
339
202
120
141
228
194
joy anticipation anger disgust sadness surprise fear trust</p>
        <p>We can notice the differences between places
most accepted, like Caminos del Inca or Parque
Kennedy. The percentage of tweets that contains
bad feelings like anger, disgust, sadness or fear is
really low, against the other positive ones. In
contrast, Lago Titicaca has fewer differences between
the sentiments expressed in the tweets. The most
frequent is fear.</p>
        <p>Correspondence Analysis: We now have the
key elements to build the crosstabs, perform the
correspondence analysis and draw the plots. First,
we can observe the distribution of sentiments per
place visited:
3,658
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%</p>
        <p>MachuPicchu CaminoInca ValleSagrado AguasCalientes Sacsayhuaman L.Titicaca
joy anticipation anger disgust sadness surprise fear trust
Iquitos R.Amazonas R.Paracas Lin.Nasca Pq.Kennedy Larcomar
joy anticipation anger disgust sadness surprise fear trust</p>
        <p>
          We have divided the visualization in two parts,
shown in Figure 15, because is hard to represent
twelve places in one shot. Having the complete
picture, we can see the predominant emotions by
place. For example, Aguas Calientes has a lot of
tweets expressing surprise and anticipation. Lago
Titicaca is associated with fear and sadness
expressions. Lineas de Nasca has many words
related to fear, like “misterious” or “unknown”. Other
similar reactions are caused by accidents
occurring to aircrafts with an alarming frequency
          <xref ref-type="bibr" rid="ref17">(StudioKnow, 2016)</xref>
          . In fact, posts like the mentioned,
were sparsed on Twitter, accompanied by other
alarming messages.
        </p>
        <p>The next step is to reduce dimensionality,
calculating the row and column scores, from the
tables shown in Table 1 and 2. With the scores, we
built the correspondence plots. Like the bar plots
in Figure 14, we divided the representation in two
blocks, to keep the results visible. We can see the
sentiments associated to each place in the plots.
2
1
2
n
o
i
0
s
n
e
m
i
D
-1
-2
-3
2
1
2
n
o
i
0
s
n
e
m
i
D
-1
-2</p>
        <p>Sacsayhuaman
AguasCalientes</p>
        <p>Lin.Nasca
fear</p>
        <p>ValleSagrado
trust joy
CaminoInca</p>
        <p>MachuPicchu
anticipation
surprise
anger
disgust
sadness</p>
        <p>fear
anger disgust</p>
        <p>L.Titicaca
2
joy</p>
        <p>Pq.Kennedy
R.Paracas Larcomar
trust
Iquitos
sadness
anticipation
R.Amazonas
surprise
-3
-2
-1</p>
        <p>The first plot in figure 16 is very clear about
Lago Titicaca. All the negative emotions
(‘disgust’, ‘fear’, ‘sad’ and ‘anger’) are close to this
place, which is interpreted as a relationship. The
words ‘trust’ and ‘joy’ are more related to places
such Valle Sagrado and Camino Inca. We have
explained the situation with pollution and death of
animals in Puno, the relationship could be
confirmed. In the case of Cusco, there local
government and authorities are aware about the
importance of providing the best experiences to the
visitors, so they and the touristic business
(operators, hotels, restaurants, guides, etc.) work
together to bring good services and hospitality.</p>
        <p>In the second plot in figure 16, we can see the
proximity between Lineas de Nasca and the
sentiment ‘fear’, Rio Amazonas with ‘surprise’ and
Paracas, Parque Kennedy and Larcomar
associated with the sentiments ‘joy’ and ‘trust’. In the case
of Nasca, many tourists commented about the
insecurity to take a flight in old and out-of-dated
airplanes.
22 Sacsayhuaman
iiseonnDm01 ValleSagrado beautifulenjoolyd happyfun
love MachuPicchu best
amazing
wait
-3
-2</p>
        <p>small</p>
        <p>L.Titicaca
luxury
-1</p>
        <p>AguasCalientes</p>
        <p>bad
CaminoInca
goodexcellent</p>
        <p>Pq.Kennedy
happy</p>
        <p>bad wait
Iquitos</p>
        <p>small
love good
best beautiful</p>
        <p>fun
excellent amazing</p>
        <p>Larcomar
enjoy</p>
        <p>R.Amazonas
old</p>
        <p>Lin.Nasca
R.Paracas
luxury
-3
-3
-2
-2
0
1
1</p>
        <p>2</p>
        <p>D-i1mension-11</p>
        <p>Considering the concepts, we have selected
some Non-Negative ones: Amazing, Best,
Beautiful, Enjoy, Excellent, Fun, Good, Happy, Love
and Luxury. The other, Negative Concepts
selected were: Bad, Old, Small, Wait. All of them were
found as common and frequently appeared in the
tweets, see Figure 17. They are used to describe or
express valuations about their experiences in the
visited places. They resume other common
expressions, to avoid synonym issues in the
interpretation of texts, and were grouped using linking
formula (1).</p>
        <p>Geolocation: In addition, we represented all
the geo-tagged tweets in a map. Only 17 thousand
tweets had Longitude and Latitude coordinates.
Notice that coordinates correspond to the user
location in the moment when the tweet was
published. For that reason, 92% of the tweets were
posted from any place in Perú, shown in Figure
18.</p>
        <p>Negative
Neutral
Positive
This study was a good way to explore the
opinions about Perú, but we have to keep in
mind some limitations, like representability
or social network popularity in our country.
Twitter activity is the highest in July, during
the northern hemisphere countries summer
vacation time. And in October, when some
incidents occurred in the most visited place,
Cusco.
Acknowledgments
We would like to express our special thanks to
Market research team in Promperú, who gave us
the opportunity to develop this project.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Promperú</surname>
          </string-name>
          .
          <year>2016</year>
          . PENTUR: Plan Estratégico Nacional de Turismo. Ministerio de Turismo y Comercio Exterior. Lima, Perú.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Mohammad</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Turney</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <year>2010</year>
          .
          <article-title>Emotions evoked by common words and phrases: Using Mechanical Turk to create an Emotion Lexicon</article-title>
          .
          <source>In Proceedings of the NAACL-HLT 2010 Workshop on Computational Approaches to Analysis</source>
          and Generation of Emotion in Text, California, LA.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Agarwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xie</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vovsha</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rambow</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Passonneau</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <year>2011</year>
          .
          <article-title>Sentiment Analysis of Twitter Data</article-title>
          .
          <source>In Proceedings of the workshop on languages in social media</source>
          , pages
          <fpage>30</fpage>
          -
          <lpage>38</lpage>
          .
          <article-title>Association for Computational Linguistics</article-title>
          . Portland, OR.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Dodds</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harris</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kloumann</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bliss</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Danforth</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <year>2011</year>
          .
          <article-title>Temporal Patterns of Happiness and Information in a Global Social Network: Hedonometrics and Twitter</article-title>
          .
          <source>PLoS ONE</source>
          <volume>6</volume>
          (
          <issue>12</issue>
          ): e26752. doi:
          <volume>10</volume>
          .1371/journal.pone.0026752
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Antoniadis</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vrana</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Zafiropoulos</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Promoting European Countries' Destination Image Though Twitter</article-title>
          . In
          <source>European Journal of Tourism, Hospitality and Recreation</source>
          ,
          <volume>5</volume>
          (
          <issue>1</issue>
          ):
          <fpage>85</fpage>
          -
          <lpage>103</lpage>
          . Leiria, Portugal.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Oku</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hattori</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Kawagoe</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>Tweetmapping method for tourist spots based on nowtweets and spot-photos</article-title>
          .
          <source>In 19th International Conference on Knowledge Based and Intelligent Information and Engineering Systems</source>
          <volume>60</volume>
          (
          <year>2015</year>
          ):
          <fpage>1318</fpage>
          -
          <lpage>1327</lpage>
          . Shiga, Japan.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Bassolas</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lenormand</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tugores</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Gonçalves</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Ramasco</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>Touristic site attractiveness seen through Twitter</article-title>
          .
          <source>In EPJ Data Science</source>
          <year>2016</year>
          :
          <fpage>5</fpage>
          -
          <lpage>12</lpage>
          . doi:
          <volume>10</volume>
          .1140/epjds/s13688-016-0073-5
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>O'Connor</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Balasubramanyan</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Routledge</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <year>2010</year>
          .
          <article-title>From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series</article-title>
          .
          <source>In Proceedings of the International AAAI Conference on Weblogs and Social Media</source>
          , Washington, DC.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Plutchik</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <year>1980</year>
          .
          <article-title>A general psychoevolutionary theory of emotion</article-title>
          .
          <source>Emotion: Theory, research and experience</source>
          ,
          <volume>1</volume>
          (
          <issue>3</issue>
          ):
          <fpage>3</fpage>
          -
          <lpage>33</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Jockers</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>Syuzhet: Extract Sentiment and Plot Arcs from Text</article-title>
          .
          <source>CRAN Network</source>
          . https://github.com/mjockers/syuzhet
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Benzécri J.-P</surname>
          </string-name>
          .
          <year>1973</year>
          . L'
          <article-title>Analyse des Données</article-title>
          . Tome 1:
          <string-name>
            <given-names>La</given-names>
            <surname>Taxinomie</surname>
          </string-name>
          . Tome 2:
          <string-name>
            <given-names>L</given-names>
            <surname>'Analyse des Correspondances</surname>
          </string-name>
          , Dunod, Paris, France.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Venables</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Ripley</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <year>2002</year>
          . Modern Applied Statistics with S.
          <source>Fourth edition</source>
          . Springer, London, UK.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>SAS Institute</given-names>
            <surname>Inc</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>SAS® Text Miner 13.1 Reference Help</article-title>
          . Cary, NC: SAS Institute Inc.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>Balbi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stawinoga</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Triunfo</surname>
            <given-names>N.</given-names>
          </string-name>
          <year>2012</year>
          .
          <article-title>Text Mining tools for extracting knowledge from Firms Annual Reports</article-title>
          .
          <source>In book: JADT</source>
          <year>2012</year>
          :
          <article-title>11es Journées internationales d'Analyse statistique des Données Textuelles</article-title>
          . Edit. Dister A.,
          <string-name>
            <surname>Longrée</surname>
            <given-names>D.</given-names>
          </string-name>
          , Burnelle G., pages
          <fpage>67</fpage>
          -
          <lpage>79</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Balbi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Stawinoga</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2013</year>
          .
          <article-title>Mining the ambiguity: correspondence and network analysis for discovering word sense</article-title>
          .
          <source>In book: SIS 2016 Scientific Meeting</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>BBC</given-names>
            <surname>News</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Peru investigates death of 10,000 Titicaca water frogs</article-title>
          . Published in Oct,
          <volume>18</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>StudioKnow.</surname>
          </string-name>
          <year>2016</year>
          .
          <article-title>Is it Safe to Fly Over the Nazca Lines?</article-title>
          Published in May,
          <volume>11</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>Bird</surname>
          </string-name>
          , Steven,
          <source>Edward Loper and Ewan Klein</source>
          <year>2009</year>
          .
          <article-title>Natural Language Processing with Python. O'Reilly Media Inc</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>Dean</surname>
          </string-name>
          and
          <article-title>Bill 2007</article-title>
          .
          <article-title>How to Write a Spelling Correc-</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>