=Paper= {{Paper |id=None |storemode=property |title=Opinion Mapping Travelblogs |pdfUrl=https://ceur-ws.org/Vol-798/paper3.pdf |volume=Vol-798 }} ==Opinion Mapping Travelblogs== https://ceur-ws.org/Vol-798/paper3.pdf
                      Opinion Mapping Travelblogs

            Efthymios Drymonas, Alexandros Efentakis, and Dieter Pfoser

                    Institute for the Management of Information Systems
                                    Research Center Athena
                             G. Mpakou 17, 11524 Athens, Greece
              {edrimon|efentakis|pfoser}@imis.athena-innovation.gr



       Abstract. User-contributed content represents a valuable information source pro-
       vided one can make sense of the large amounts of unstructured data. This work
       focusses on geospatial content and specifically on travelblogs. Users writing sto-
       ries about their trips and related experiences effectively provide geospatial infor-
       mation albeit in narrative form. Identifying this geospatial aspect of the texts by
       means of applying information extraction techniques and geocoding, we relate
       portions of texts to locations, e.g., a paragraph is associated with a spatial bound-
       ing box. To further summarize the information, we assess the opinion (“mood”)
       of the author in the text. Aggregating this mood information for places, we essen-
       tially create a geospatial opinion map based on the user-contributed information
       contained in the articles of travelblogs. We assessed the proposed approach with
       a corpus of more than 150k texts from various sites.


1   Introduction
Crowdsourcing moods and in our specific case opinions from user-contributed data,
has recently become an interesting field with the advent of micro-blogging services
such as, e.g., Twitter. Here, blog entries reflect a myriad of different user opinions that
when integrated can give us valuable information about, e.g., the stock market [4]. In
this work, our focus is on (i) extracting the user opinion about places from travel blog
entries, (ii) aggregating such opinion data, and, finally, (iii) visualizing it.
    The specific contributions in this work are as follows. In an initial stage several
travelblog Web sites have been crawled and over 150k texts have been collected. Fig-
ure 5 shows such an example travelblog entry. The collected texts are then geoparsed
and geocoded to link placename identifiers (toponyms) to location information. With
paragraphs as the finite granularity for opinion information, texts are then assessed with
the OpinionFinder tool and are assigned a score for each paragraph ranging from very
negative to very positive. Scores are linked to the bounding box of the paragraph and
are aggregated using a global grid, i.e., the score of a specific paragraph is associated
with all intersecting grid cells. Aggregation of opinions is then performed simply by
computing the average of all scores for each cell. Finally, the score can be visualized by
assigning colors to each cell.
    While to the best of our knowledge there exists no work aiming at extracting opin-
ions about places from travel blogs, we can cite the following related work. The con-
cept of information visualization using maps is gaining significant interest in various
research fields. As examples, we can cite the following works [26] [23] [29] [12].
For the purpose of recognizing toponyms, the various approaches use ideas and work
from the field of Natural Language Processing (NLP), Part-Of-Speech (POS) tagging
and a part of Information Extraction related tasks, namely Named Entity Recognition
(NER) [13]. These approaches, can be roughly classified as rule-based [6] [7] [8]
[21] [30] and machine learning - statistical [14] [16] [26] [22]. Once toponyms have
been recognized, a toponym resolution procedure resolves geo-ambiguity. There are
many methods using a prominence measure such as population combined with other
approaches [21] [25]. With respect to geocoding, we can exemplary cite [17], one
of the first works on geocoding and describing a navigational tool for browsing web
resources by geographic proximity as an alternative means for Web navigation. Web-
a-Where [1] is another system for geocoding Web pages. It assigns to each page a
geographic focus that the page discusses as a whole. The tagging process targets large
collections of Web pages to facilitate a variety of location-based applications and data
analyses. The work presented in [15] is identifying and disambiguating references to
geographic locations. Another method that uses information extraction techniques to
geocode news is described in [26]. Other toponym resolution strategies involve the use
of geospatial measures such as minimizing total geographic coverage [14], or mini-
mizing pairwise toponym distance [16]. An approach for the extraction of routes from
narratives is given in [9]. The proposed IE approach has been adapted to fit the require-
ments of this work. While statistical NER methods can be useful for analysis of static
corpora, in the case of continuously user contributed travel narratives they are not well-
suited, due to their dynamic and ever-changing nature [25]. For this purpose, we rely
on a powerful rule-based solution based on a modular pipeline of distinct, independent
and well-defined components based on NLP and IE methods, as we will see in the next
section. Regarding related work on opinion classification and sentiment analysis [20],
we can find methods basically relying on streaming data [18] [10] [11] [19]. Recently
[2] discusses the challenges that Twitter streaming data poses. The work focusses on
sentiment analysis and proposes the sliding-window Kappa statistic as an evaluation
metric for data streams.
    The remainder of this work is organized as follows. Section 2 describes the infor-
mation extraction techniques employed in our approach dealing specifically with the
aspects of geoparsing and geocoding travel blog entries. Section 3 outlines a method
for computing user sentiment scores from travel blog entries. Section 4 outlines how
such scores can be aggregated and visualized based on geospatial locations. In addition
some specific examples are shown to give an initial validation of the proposed approach.
Finally, Section 5 presents conclusions and directions for future work.

2   Information Extraction
In what follows, we describe in detail the processing pipeline, which overall uses an
HTML document as input (travel blog article) and produces a structured XML file con-
taining the various entities and their respective attributes (toponyms and coordinate
information for the text).
    The pipeline consist of four parts (cf. Figure 1), (i) the HTML parsing module, (ii)
the linguistic pre-processing, (iii) the main IE engine system (semantic analysis) and
(iv) the geocoding-postprocessing part. In the next section, we describe the first part
of our processing pipeline, i.e., the collection of HTML texts, the parsing and their
conversion to plain text format, in order to prepare the documents for the forthcoming
step of linguistic-preprocessing.




                                    Fig. 1. IE architecture pipeline




2.1   Web Crawling
For collecting travel blog articles containing rich geospatial information, we crawled
Web sites providing traveblog authoring services. Each Web site has its own HTML
layout and isolating text of interest from crawled and parsed HTML pages is done by
hand. Thus, there was a need for Web sites with massive amounts of such type of doc-
uments. For this purpose we crawled travelpod.com, travelblog.org, traveljournals.net
and worldhum.com, resulting in more than 150,000 documents. For crawling the web
sites, we used Regain crawler1 , which creates a Lucene2 index for indexing the doc-
uments’ information, while for HTML parsing and the extraction of useful plain text
narratives, we used Jericho HTML parser3 .
 1
   http://regain.sourceforge.net/
 2
   http://lucene.apache.org/
 3
   http://jericho.htmlparser.net/
2.2   Linguistic pre-Processing
To prepare the input for the core IE engine for extracting objects of interest, the parsed
plain text documents must be prepared accordingly. Such preparation includes linguis-
tic pre-processing tools that analyze natural language documents in terms of distinct
base units (i.e., words), sentences, part-of-speech and morphology . We are using the
ANNIE tools, contained in the GATE release4 , to perform this initial part of analysis.
To this task, our processing pipeline comprises of a set of four modules: (i) the ANNIE
tokenizer, (i) the (ANNIE) Sentence Splitter, (iii) the ANNIE POS Tagger and (iv) the
WordNet Lemmatiser.
     The intermediate processing results are passed on to each subsequent analysis tool
as GATE document annotation objects. The output of this analysis part is the analyzed
document and it is transformed in CAS/XML format5 , which will be passed to the
subsequent semantic analysis component as input, Cafetiere IE engine [3]. Cafetiere
combines the linguistic information acquired by the pre-processing stage of analysis
with knowledge resources information, namely the lookup ontology and the analysis
rules to semantically analyze the documents and recognize spatial information, as we
will see later in this section.
     The first step in the pipeline process is tokenization, i.e., recognizing in the input text
basic text units (tokens), such as words and punctuation and orthographic analysis and
the association of orthographic features, such as capitalization, use of special characters
and symbols, etc. to the recognized tokens. The tools used are ANNIE Tokenizer and
Orthographic Analyzer.
     Sentence splitting, in our case the ANNIE sentence splitter aims at the identification
of sentence boundaries in a text.
     Part-of-speech (POS) tagging is then the process of assigning a part-of-speech class,
such as Noun, Verb etc. to each word in the input text. The ANNIE POS Tagger imple-
mentation is a variant of Brill Transformation-based learning tagger [5], which applies
a combination of lexicon information and transformation rules for the correct POS clas-
sification.
     Lemmatisation is used for text normalisation purposes. With this process we retrieve
the tokens base form e.g., for words: [travelling, traveler, traveled], [are, were], the
corresponding lemmas are: travel, be. We exploit this information in the semantic rules
section. For this purpose we implement the JWNL WordNet Java Library API6 for
accessing the WordNet relational dictionary. The output of this step is included it in
GATE document annotation information.

2.3   Semantic Analysis
Semantic analysis relates the linguistic pre-processing results to ontology information,
as we will see in the next subsection about ontology lookup and applies semantic anal-
ysis grammar rules, i.e., documents are analyzed semantically to discover spatial con-
cepts and relations.
 4
   http://gate.ac.uk/
 5
   CAS is an XML scheme called Common Annotation Scheme allowing for a wide range of
   annotations, structural, lexical, semantic and conceptual.
 6
   http://sourceforge.net/projects/jwordnet/
    For this purpose we used Cafetiere IE engine, whose objective is to compile a set
of semantic analysis grammar rules in a cascade of finite state transducers so as to
recognize in text the concepts of interest. Cafetiere IE Engine combines all previously
acquired linguistic and semantic information with contextual information. We modified
Cafetiere and implemented it as a GATE pipeline module (GATE creole) for the pur-
pose of performing ontology lookup and rule-based semantic analysis on information
acquired from previous pipeline modules, in the form of GATE annotation sets. The
input to this process are the GATE annotation objects resulted from the linguistic pre-
processing stage stored transformed in Cafetiere needed format, in CAS/XML format
for each individual document.


Cafetiere Ontology Lookup The use of knowledge lexico-semantic resources assists
in the identification of named entities. These semantic knowledge resources may be in
the form of lists (gazetteers) or more complex ontologies providing mappings of text
strings to semantic categories, such as in general male/female person names, known
organizations and known identifiers of named entities. In our case, the named enti-
ties we want to extract with IE methods are location based information. For example,
a gazetteer for location designators might have entries such as “Sq.”, “blvd.”, “st.” etc.
that denote squares, boulevards and streets accordingly. Similarly there are various sorts
of gazetteers available for given person names, titles, location names, companies, cur-
rencies, nationalities etc. Thus, the named entity (NE) recognizer can use gazetteer
information so as to classify a text string as denoting an entity of a particular class.
However, in order to associate specific individual entities object identifiers are required
as well as class labels, enabling aliases or abbreviations to be mapped to a concrete in-
dividual. For example, for an entity such as “National Technical University of Athens”
the respective abbreviation “NTUA” could be included in the knowledge resource as an
alias for the respective entity. Thus, more sophisticated knowledge resources than plain
gazeteers in the form of ontologies may be used to provide this type of richer semantic
information and allow for the specification and representation of more information, if
necessary, than identity and class inclusion.
    In this way, Cafetiere Ontology lookup module accesses a previously built ontology
to retrieve potential semantic class information for individual tokens or phrases. All
types of conceptual information, related to domain specific entities, such as terms or
words in general that denote spatial concepts or properties and relations of domain in-
terest are pre-defined in this ontology. For example, consider the partial ontology shown
in Figure 2. Class “LOCVERB” stores verbs that when matched to a text phrase are
likely to indicate a spatial relationship between the corresponding referenced concepts.
We label as semantic any classification of tokens according to their meaning in the field
of the application, in our case, geosemantics. This could be done, on a broad coverage
level, by reading information from a comprehensive resource such as WordNet lexicon
about most content words. However, the practice in information extraction applications
as discussed in previous paragraph, has been to make the processing application-specific
by using lists of the semantic categories of only relevant words and phrases, done by
hand. The ontology used in our experimentation was created by manually analyzing a
large number of texts and iteratively refining the ontology with words (e.g., verbs) that
                          Fig. 2. Sample ontology contents (Protg ontology editor)




when matched to a text phrase are likely to indicate a spatial relationship between the
corresponding referenced concepts. Summarizing, the lookup stage of analysis:

 1. Supplies semantic classes (concepts) corresponding to words and phrases.
 2. Supplies object identifiers for known instances, including where aliases and abbre-
    viations name the same instance (For example “National Technical University of
    Athens”, “NTUA”).
 3. Supplies properties of known instances, for example the country of which a city is
    the capital.
 4. Uses verbs of interest to the application in order to identify inside the phrase po-
    tential unknown instances.


Cafietiere Information Extraction engine The approaches to Named Entity recogni-
tion with IE methods can be divided into two main categories:

 – Linguistic/rule-based approaches: in these approaches the Named Entity recogni-
   tion is based on linguistic/semantic rules defining the possible linguistic patterns
   denoting Named Entity concepts, such as for example the approaches adopted by
   ANNIE7 , and Cafetiere [3]. These approaches can achieve better results than most
   statistics or machine learning approaches, but they require extensive human ef-
   fort for the development of the necessary knowledge resources (rules and lexico-
   semantic resources, like ontologies, described in Cafetiere ontology lookup sec-
   tion). For this reason the adaptation of rule-based systems to new domains is a slow
   and laborious process.
 7
     http://gate.ac.uk/
    – Machine learning/statistics-based approaches: these approaches view Named En-
      tity recognition as a classification problem and, they have gained increased pop-
      ularity due to their relatively rapid development/ domain customization and the
      reduced amount of human effort required.
    Cafetiere is a rule-based system for IE. A set of linguistic patterns (i.e., extraction
rules) is written taking into account the lookup ontology and all previously acquired
information from linguistic pre-processing. The semantic analysis rules, are developed
as a set of context-sensitive/context-free grammar (CSG/CFG) rules and are compiled
in a cascade of finite state transducers so as to recognize the concepts of interest in plain
texts.

2.4     PostProcessing
In this part, all information regarding each object of interest for each document in the
collection is imprinted as a GATE annotation set object. For each document, we have
collected information about all extracted entities, along with their respective paragraph,
sentence and character offset in this document. During the HTML parsing process we
keep the scope that each document is referred to in order to use this information for
geocoding each extracted entity. For geocoding, we initially implemented YAHOO!
Placemaker8 and used in combination with Cafetiere’s output, in order to deliver better
results. We observed that PlaceMaker worked well for disambiguating some entities, but
it identified significant fewer place entities than our IE engine. Thus, in the remaining
entities extracted by Cafetiere, we applied YAHOO! Placefinder9 to geocode this place
information passing the scope information described below for delivering more accurate
results.
     Finally, for each HTML travel blog entry (narrative), we created a collection of
extracted referred geo-entities, some of them not being able to geocode. For each of
these entities there is specific information (acquired from each of the previous pipeline
steps) about where they were encountered in the respective document, namely, sentence,
paragraph and offset character. Additionally, for each document, we calculated the mean
coords and standard distance from all geocoded points extracted. All this information,
along with the local parsed text file path and the respective URL of the document, are
stored lastly into XML format for each corresponding plain text narrative. Samples of
plain text narrative and the corresponding structured XML file are shown in Figure
3 and Figure 4 respectively. The XML tags in Figure 4 are denoting either statistical
information, like the mean center and the standard distance of all geocoded locations
for each document, or information related with each extracted entity, i.e., the offset
characters, the sentence and paragraph ID.


3      Opinion Mapping
Having geocoded the travel blog entries, we, in the following step, want to assign sen-
timent information (“mood”) to text. To this effect, we use OpinionFinder [28], a sys-
 8
     http://developer.yahoo.com/geo/placemaker/
 9
     http://developer.yahoo.com/geo/placefinder/
                                        Fig. 3. Sample plain text




                                       Fig. 4. Resulting XML file




tem that performs subjectivity analysis, automatically identifying when opinions, sen-
timents, or speculations are present in text. It aims to identify subjective sentences, as
also marking various aspects of subjectivity in these sentences, including the source
(holder) of the subjectivity and words that are included in phrases expressing positive
or negative sentiments.
    OpinionFinder operates as one large pipeline. Conceptually, the pipeline can be
divided into two parts. The first part performs mostly general purpose document pro-
cessing (e.g., tokenization and part-of-speech tagging). The second part performs the
subjectivity analysis. The results of the subjectivity analysis are returned to the user in
the form of SGML/XML markup of the original documents.
    For the first part, OpinionFinder takes any incoming text source and removes HTML
or XML meta info. Sentences are split and POS tagged using OpenNLP10 , the open
source solution providing a variety of java-based NLP tools which perform sentence
detection, tokenization, pos-tagging, chunking and parsing, named-entity detection, and
coreference using the OpenNLP Maxent machine learning package. Next, stemming
is accomplished using Steven Abneys’ SCOL v1K stemmer program11 . SUNDANCE
(Sentence UNDerstanding And ConceptExtraction) [28], is used to provide semantic
10
     http://opennlp.sourceforge.net/
11
     http://www.vinartus.net/spa/
class tags, identify extraction patterns needed by the sentence classifiers, identifying
the source of subjective content and distinguishing author statements from related or
quoted statements. A final parse in batch mode establishes constituency parse trees,
which are converted to dependency parse trees for Named Entity and subject detection.
    At this point, for the second part, a Naive Bayes classifier identifies subjective sen-
tences. The classifier is trained against subjective and objective sentences generated by
two additional rule-based classifiers drawing from large corpora [27]. Next, a direct
subjective expression and speech event classifier tags the direct subjective expressions
and speech events found within the document using WordNet12 . The final step applies
actual sentiment analysis to sentences that have been identified as subjective. This is
accomplished with two classifiers that were developed using the BoosTexter [24] ma-
chine learning program and trained on the MPQA Corpus13 .


4     Mapping Opinion Scores
OpinionFinder produces sentiment information assigned to paragraphs of texts. In the
following, we describe how this information can be aggregated for specific locations.

4.1   Aggregating Sentiments
OpinionFinder was applied to all texts of our collection of 150k travel blog entries as-
signing sentiment data to each paragraph of the collection. In the analysis that follows,
only paragraphs containing geospatial data were retained. For each of these paragraphs
we keep the total referred positive and negative sentiment scores as computed by Opin-
ionFinder.
    Each paragraph contains zero, one or multiple geographic entities that were suit-
ably geocoded. In order to show the spatial extent of a paragraph, we chose to spatially
visualize only paragraphs in which the MBR of the contained toponyms does not ex-
ceed 0.5 degrees in either dimension (e.g., Max latitude − Min latitude ≤ 0.5 AND
Max longitude − Min longitude ≤ 0.5). Consequently, only paragraphs of limited and
focused spatial extent are visualized, thus preventing paragraphs that refer to larger
geographic entities (e.g., Europe) to dominate in the results.
    We used five different categories for mapping opinion scores. The categories and
respective color are given in Table 1, where each category scales from negative (red) to
positive (green).
    The proposed approach is clarified by the following example. A sample document14
(Figure 5) contains several paragraphs mentioning Washington D.C. and its landmarks.
For each of this document’s paragraphs, a MBR covering the discovered toponyms was
created and each paragraph was assigned a category according to Table 1. Therefore,
this document may be spatially visualized on a map as shown on Figure 6.
    Although this approach is viable when there is a limited number of documents and
paragraphs, we need to overcome the following problem. Multiple paragraphs from dif-
ferent documents and different scores may partially target the same area, e.g., we need
12
   http://wordnet.princeton.edu/
13
   http://nrrc.mitre.org/NRRC/publications.htm.
14
   http://www.travelpod.com/travel-blog-entries/drfumblefinger/1/1269627251/tpod.html
                            Result (positive - negative) Colour
                                       ≤ −3               Red
                                    =-1 OR =-2           Orange
                                         0               Yellow
                                     =1 Or =2             Olive
                                        ≥3               Green
                     Table 1. Opinion mapping to colour representation




                 Fig. 5. Washington D.C. - sample document and toponyms




to visualize partially overlapping MBRs with different scores (colors). To do that, we
split each paragraph MBR into small cells of a regular grid of 0.0045 degrees (corre-
sponding to 500m) in each dimension. For each of those cells we sum up the sentiment
score from all the containing paragraph MBRs. With this approach, instead of trying
to visualize overlapping paragraph MBRs with different scores (colors), we visualize
distinct small cells with each being assigned a unique score (and color). Consequently,
it is easy to visualize the overall sentiment scores independent of how many paragraphs
target the same area.

4.2   Opinonmap Examples
Further examples shown in the following include the geospatial opinion map of Am-
sterdam of Figure 7. It is interesting to observe that while most of the city is shaded
green, the area around the train station and the Red Light district are shown in red, i.e.,
expressing rather negative sentiment.
    Figure 8 gives a geospatial opinion map of Central Europe indicating the areas men-
tioned in the travel blogs. What can be observed is that positive sentiments are associ-
ated with areas in Switzerland and also Italy, while urban areas such as Brussels overall
attract more negative sentiments.

4.3   Summary
Our initial experiments with the creation of geospatial opinion maps derived from sub-
jective travelblog entries show that there is a clear bias for certain geographic areas
shared by people. However, since in this work we only performed a simple aggregation
of the scores generated by the OpinionFinder tool, it will require more in-depth analysis
of the results to generate accurate statements and trends.
                 Fig. 6. Washington D.C. - geospatial opinion visualization




5   Conclusions

Aggregating opinions is important for utilizing and assessing user-generated content.
This work provides a means of visualizing sentiments for specific geographic areas as
derived from travel blog entries. To demonstrate the approach, several travel blog sites
were crawled and a total of more than 150,000 pages/articles were processed. Using
(i) geoparsing and geocoding tools the content was geo referenced and (ii) sentiment
information was derived using the OpinionFinder tool. In the proposed approach, sen-
timent information from various articles relating to the same geographic area is aggre-
gated and visualized accordingly by means of a geospatial heat meat. Directions for
future work are as follows. The current approach for aggregating user sentiment for
geographic areas is rather simple and a more in-depth analysis of the results is needed
to generate accurate statements and trends. An obvious improvement will also be to
examine/include microblogging content streams. Here, sentiment information will be
updated live and thus represent an accurate picture of the situation of a specific geo-
graphic area over time. Finally, OpinionFinder is a general purpose tool for deriving
user sentiment. More involved approaches exist and need to be examined/developed for
the case of geospatial data.


Acknowledgements

The research leading to these results has received funding from the European Union
Seventh Framework Programme - Marie Curie Actions, Initial Training Network GEOCROWD
(http://www.geocrowd.eu) under grant agreement No. FP7-PEOPLE-2010-ITN-264994.
             Fig. 7. Amsterdam, The Netherlands - geospatial opinion visualization



References
 1. E. Amitay, N. Har’El, R. Sivan, and A. Soffer. Web-a-where: geotagging web content. In
    Proceedings of the 27th annual international ACM SIGIR conference on Research and de-
    velopment in information retrieval, SIGIR ’04, pages 273–280, New York, NY, USA, 2004.
    ACM.
 2. A. Bifet and E. Frank. Sentiment knowledge discovery in twitter streaming data. In Proceed-
    ings of the 13th international conference on Discovery science, DS’10, pages 1–15, Berlin,
    Heidelberg, 2010. Springer-Verlag.
 3. W. J. Black, J. McNaught, A. Vasilakopoulos, K. Zervanou, B. Theodoulidis, and F. Rinaldi.
    Cafetiere: Conceptual annotations for facts, events, terms, individual entities and relations.
    Technical report, Jan 2005. Parmenides Technical Report, TR-U4.3.1.
 4. J. Bollen, H. Mao, and X.-J. Zeng. Twitter mood predicts the stock market. Journal of
    Computational Science, 2(1):1–8, 2011.
 5. E. Brill. Transformation-based error-driven learning and natural language processing: A case
    study in part-of-speech tagging. Computational Linguistics, 21:543–565, 1995.
 6. D. Buscaldi and P. Rosso. A conceptual density-based approach for the disambiguation of
    toponyms. Int. J. Geogr. Inf. Sci., 22:301–313, January 2008.
 7. P. Clough. Extracting metadata for spatially-aware information retrieval on the internet. In
    Proceedings of the 2005 workshop on Geographic information retrieval, GIR ’05, pages
    25–30, New York, NY, USA, 2005. ACM.
 8. J. Ding, L. Gravano, and N. Shivakumar. Computing geographical scopes of web resources.
    In Proceedings of the 26th International Conference on Very Large Data Bases, VLDB ’00,
    pages 545–556, San Francisco, CA, USA, 2000. Morgan Kaufmann Publishers Inc.
 9. E. Drymonas and D. Pfoser. Geospatial route extraction from texts. In DMG ’10: Proceed-
    ings of the 1st ACM SIGSPATIAL International Workshop on Data Mining for Geoinformat-
    ics, pages 29–37, New York, NY, USA, 2010. ACM.
10. A. Go, R. Bhayani, and L. Huang. Twitter Sentiment Classification using Distant Supervi-
    sion. Technical report, Stanford University.
11. B. J. Jansen, M. Zhang, K. Sobel, and A. Chowdury. Micro-blogging as online word of
    mouth branding. In Proceedings of the 27th international conference extended abstracts on
                        Fig. 8. Europe - geospatial opinion visualization




    Human factors in computing systems, CHI EA ’09, pages 3859–3864, New York, NY, USA,
    2009. ACM.
12. R. Jianu and D. Laidlaw. Visualizing gene co-expression as google maps. In G. Be-
    bis, R. Boyle, B. Parvin, D. Koracin, R. Chung, R. Hammound, M. Hussain, T. Kar-Han,
    R. Crawfis, D. Thalmann, D. Kao, and L. Avila, editors, Advances in Visual Computing,
    volume 6455 of Lecture Notes in Computer Science, pages 494–503. Springer Berlin / Hei-
    delberg, 2010.
13. D. Jurafsky and J. H. Martin. Speech and Language Processing: An Introduction to Natural
    Language Processing, Computational Linguistics and Speech Recognition (Prentice Hall
    Series in Artificial Intelligence). Prentice Hall, 1 edition, 2000.
14. J. L. Leidner. Toponym resolution in text: annotation, evaluation and applications of spatial
    grounding. SIGIR Forum, 41:124–126, December 2007.
15. M. D. Lieberman, H. Samet, and J. Sankaranarayanan. Geotagging with local lexicons to
    build indexes for textually-specified spatial data. In International Conference on Data Engi-
    neering, pages 201–212, 2010.
16. M. D. Lieberman, H. Samet, J. Sankaranarayanan, and J. Sperling. Steward: architecture
    of a spatio-textual search engine. In Proceedings of the 15th annual ACM international
    symposium on Advances in geographic information systems, GIS ’07, pages 25:1–25:8, New
    York, NY, USA, 2007. ACM.
17. K. S. McCurley. Geospatial mapping and navigation of the web. In Proceedings of the 10th
    international conference on World Wide Web, WWW ’01, pages 221–229, New York, NY,
    USA, 2001. ACM.
18. B. O’Connor, R. Balasubramanyan, B. Routledge, and N. Smith. From tweets to polls:
    Linking text sentiment to public opinion time series, 2010.
19. A. Pak and P. Paroubek. Twitter as a corpus for sentiment analysis and opinion mining. In
    Proceedings of the Seventh conference on International Language Resources and Evaluation
    (LREC’10), Valletta, Malta, May 2010. European Language Resources Association (ELRA).
20. B. Pang and L. Lee. Opinion mining and sentiment analysis. Found. Trends Inf. Retr., 2:1–
    135, January 2008.
21. R. S. Purves, P. Clough, C. B. Jones, A. Arampatzis, B. Bucher, D. Finch, G. Fu, H. Joho,
    A. K. Syed, S. Vaid, and B. Yang. The design and implementation of spirit: a spatially aware
    search engine for information retrieval on the internet. Int. J. Geogr. Inf. Sci., 21:717–745,
    January 2007.
22. G. Quercini, H. Samet, J. Sankaranarayanan, and M. D. Lieberman. Determining the spatial
    reader scopes of news sources using local lexicons. In Proceedings of the 18th SIGSPATIAL
    International Conference on Advances in Geographic Information Systems, GIS ’10, pages
    43–52, New York, NY, USA, 2010. ACM.
23. R. E. Roth, K. S. Ross, B. G. Finch, W. Luo, and A. M. MacEachren. A user-centered ap-
    proach for designing and developing spatiotemporal crime analysis tools. Zurich, Switzer-
    land, 14-17th September, 2010 2010. GIScience.
24. R. E. Schapire and Y. Singer. BoosTexter: A Boosting-based System for Text Categorization.
    Machine Learning, 39(2/3):135–168, 2000.
25. N. Stokes, Y. Li, A. Moffat, and J. Rong. An empirical study of the effects of nlp components
    on geographic ir performance. Int. J. Geogr. Inf. Sci., 22:247–264, January 2008.
26. B. E. Teitler, M. D. Lieberman, D. Panozzo, J. Sankaranarayanan, H. Samet, and J. Sperling.
    Newsstand: a new view on news. In Proceedings of the 16th ACM SIGSPATIAL international
    conference on Advances in geographic information systems, GIS ’08, pages 18:1–18:10, New
    York, NY, USA, 2008. ACM.
27. J. Wiebe and E. Riloff. Creating subjective and objective sentence classifiers from unanno-
    tated texts. In Proceedings of the 6th International Conference on Intelligent Text Process-
    ing and Computational Linguistics (CICLing-2005), pages 486–497, Mexico City, Mexico,
    2005.
28. T. Wilson, P. Hoffmann, S. Somasundaran, J. Kessler, J. Wiebe, Y. Choi, C. Cardie, E. Riloff,
    and S. Patwardhan. Opinionfinder: a system for subjectivity analysis. In Proceedings of
    HLT/EMNLP on Interactive Demonstrations, HLT-Demo ’05, pages 34–35, Stroudsburg, PA,
    USA, 2005. Association for Computational Linguistics.
29. J. Zhang, H. Shi, and Y. Zhang. Self-organizing map methodology and google maps ser-
    vices for geographical epidemiology mapping. Digital Image Computing: Techniques and
    Applications, 0:229–235, 2009.
30. W. Zong, D. Wu, A. Sun, E.-P. Lim, and D. H.-L. Goh. On assigning place names to geogra-
    phy related web pages. In Proceedings of the 5th ACM/IEEE-CS joint conference on Digital
    libraries, JCDL ’05, pages 354–362, New York, NY, USA, 2005. ACM.