=Paper= {{Paper |id=Vol-3683/sample-2col |storemode=property |title=Geolocation Extraction From Reddit Text Data |pdfUrl=https://ceur-ws.org/Vol-3683/paper2.pdf |volume=Vol-3683 |authors=Mila Stillman,Anna Kruspe |dblpUrl=https://dblp.org/rec/conf/ecir/StillmanK24 }} ==Geolocation Extraction From Reddit Text Data== https://ceur-ws.org/Vol-3683/paper2.pdf
                                Geolocation Extraction From Reddit Text Data
                                Mila Stillman∗ , Anna Kruspe
                                Technische Hochschule Nürnberg, Fakultät Informatik, Hohfelderstraße. 40, Nürnberg
                                Hochschule München, Lothstraße 64, München


                                           Abstract
                                           Reddit has been an important source of news and information exchange in the past two decades. This
                                           social media platform is composed of communities forming spaces where users with common interests
                                           can share their experiences and opinions. Many of these online communities, called ’subreddits’, are
                                           highly active, with some subreddits having tens of millions of followers. The extraction of geographic
                                           information from social media text data has been an increasing topic of research in the last years. While
                                           most of the work has been done on Twitter data, recent restrictions to its Application Programming
                                           Interface (API) have limited the access for academic research. Reddit.com could be a good alternative
                                           source due to its depth of discussions and communal interactions. Additionally, it allows more anonymity
                                           to users, thereby rendering it more ethically acceptable as it reduces users’ traceability. However, this
                                           characteristic complicates the process of objective verification and evaluation. Nevertheless, we believe
                                           that Reddit is a rich source of information that could be used for training models for geolocation extraction.
                                           This paper presents a few first such experiments of extracting geographic data from location-based
                                           communities on Reddit and classifying posts on a city-level using BERT for text classification.

                                           Keywords
                                           Geospatial Text Mining, Social Media, BERT Classification, Reddit




                                1. Introduction
                                Geolocation extraction from text, also called geoparsing, is a subfield of geospatial data extraction
                                and analysis, and refers to the determination of the geographic location of unstructured textual
                                data including social media data. Geoparsing can be beneficial to the discovery of common
                                interests, opinions or issues in specific geographic locations on different scales. The authors in
                                [1] identified seven main application domains of geoparsing, namely geographic information
                                retrieval, support during disaster events, crime management, disease spread management, traffic
                                management, tourism management and spatial humanities. Geolocated social media posts were
                                also found to be useful in remote sensing applications [2]. For example, social media data from
                                Twitter was used to classify buildings in urban regions in combination with remote sensing
                                images [3], to quickly obtain situational awareness in developing crises [4, 5], and to support
                                the detection of misinformation spread during the Russo-Ukrainian war [6, 7, 8].
                                   Reddit is a well established platform for online discourse with 52 million daily active users,
                                303.4 million posts and two billion comments in 2020 [9]. According to statista.com there are

                                GeoExT 2024: Second International Workshop on Geographic Information Extraction from Texts at ECIR 2024, March 24,
                                2024, Glasgow, Scotland
                                ∗
                                    Research conducted while at Technische Hochschule Nürnberg. Supported by Hochschule München.
                                Envelope-Open mila.stillman@hm.edu (M. Stillman); anna.kruspe@hm.edu (A. Kruspe)
                                Orcid 0009-0008-8976-8855 (M. Stillman); 0000-0002-2041-9453 (A. Kruspe)
                                         © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).




CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
around two billion monthly visits to Reddit, and more than 130,000 active communities, or
subreddits [10]. Subreddits are sections on the website where discussions are organized in a
forum-like fashion. Moderators ensure that posts are relevant and adhere to community rules,
with automation playing an increasing role in the process [9]. Redditors are predominately
male (around 65%), and the majority of web traffic comes from English speaking countries: USA
(around 48.5%), Canada and UK (each approximately 7%) and Australia (4.63%). Germany is
the sole non-English speaking country within the top five countries active on Reddit (around
3%) [11]. The range of topics of subreddits covers almost any conceivable topic. Thus, topics
related to locations are included, i.e., subreddits exist for many large cities worldwide, as well
as for smaller cities with local communities. In this paper, we conduct first experiments on
Reddit posts from location-related subreddits using a purely textual approach. The premise
underlying our study is that posts and comments from location-focused subreddits with ongoing
moderation, are likely to contain textual cues enabling automated methods to determine the
geographic origin directly from the content.


2. Related work
The topic of geographic information extraction from text is challenging due to considerable
linguistic variability inherent in such data and its complex relations within a geospatial frame-
work. A large part of the research was focused on Named Entity Recognition (NER), which is
the categorization of important entities within unstructured text, specifically location related
entities [1]. To-date, research using social media data was mainly focused on Twitter data due
to the size of the platform and the previously available API, which included precise geolocation
in the form of GPS-based coordinates and place location with variable granularity. Research on
extracting geolocation based on the textual content, as well as based on user-location profiling
or a combination of both has been conducted, e.g., in the shared task on geolocation prediction
at the Workshop on Noisy User-generated Text (WNUT) in 2016 [12].
   The authors in [13] used statistical machine learning methods such as K-Nearest Neighbor
and Support Vector Machines to extract geolocation from Tweets without an indicated location.
Others used deep neural networks [14] for this task. The authors in [15] used CNNs and user
metadata, achieving an accuracy of 52.8% and 92.1% on city and country level predictions re-
spectively. Research has employed Knowledge Graphs (KG) generated from Gazeteers and other
geographic resources to link entities mentioned in Tweets with relevant geographic information
[16, 17]. Wing and Baldridge [18] used a hierarchical approach via logistic regression models
on nodes in geodesic grids. Huang and Carley [19] worked on hierarchical approaches using
initial classification at a country-level, and subsequently refining it to city-level. Recently,
Transformer-based language models, such as Bidirectional Encoder Representations from Trans-
formers (BERT) [20] have also been used extensively due to their success in other NLP tasks
[21, 22, 23, 24, 25]. E.g., [21] achieved an F1-score of 0.77 for location detection in Indonesia.
Hybrid approaches have also been utilized, e.g., the authors in [25] used geographic data from
OpenStreetMap [26] fused with a BERT model, achieving an F1-score of up to 0.88 on certain
Twitter datasets. More detailed information about previous work on geoparsing using Twitter
data can be found in the survey by Hu et al. [1].
   Since 2023, Twitter API is not affordable for academic research. In addition, Twitter reduced
Tweets with precise geolocation to less than 0.2% in 2021 [27] and many of the user-set place
names and geolocations on Twitter were proven unreliable [28]. Therefore, research needs to
be extended beyond those labels and platform. Reddit is one of the strong alternative candidates
with a freely available API. To our knowledge, research on geoparsing using Reddit data was
limited due to the lack of ground-truth data. Harrigian [29] attempted to classify Reddit user
geolocation using subreddit metadata and extracting localized data via queries for questions
such as ’where do you live?’. The results showed that models trained on Reddit data significantly
outperformed transferred models trained on Tweets.


3. Methodology and Results
3.1. Data collection and model
We used a subset of the Reddit Data Dump dataset from Academic Torrents1 , which was
collected using the Pushshift API [30]. Namely we used posts from the last quarter of 2022
for the training and validation, and tested the model on data from February 2023 to avoid
immediate temporal influences. Subreddits of large cities by population in the USA and in
Germany were chosen manually for the classification task (the full list is available under
https://github.com/Milast/GeoExt_Reddit). Overall there were 181 subreddits from 126 cities
that included the names of cities such as ’/rBerlin’2 , as well as other location-related subreddits
such as ’/rChicagoFoods’, ’/rportlandmusic’, etc., which were aggregated under the relevant
city name. To train the model, only cities with over 100 posts during the mentioned time frame
were included in the experiments, resulting in 57 cities being selected.
   A pre-trained BERT model from Hugging Face [31] was used for the text classification. We
chose the multilingual bert-base-multilingual-cased model, due to the ability of the case
sensitive model to deal with the capitalization of nouns in the German language. The title
and content of Reddit posts were combined and used for the classification. The code from the
tutorial of Winastwan [32] was modified to fit the task. The model comprises a BERT tokenizer
with a maximal text length of 512, which generated the input into the BERT transformer model
with 12 layers of transformer encoder and hidden size of 768. A classification layer was used to
discriminate between the cities. The fine-tuned model was then evaluated on the test data.

3.2. Experiments and results
First, a country-level classification was performed and achieved 98.2% accuracy (see AppendixC).
This result was somewhat expected due to the nature of BERT embeddings, providing higher
similarity between languages than semantic similarity [33]. Since the focus was on two countries
with a large language difference (US and Germany), we then conducted a non-hierarchical
city-level classification, assuming that country level differences would be captured automatically.
After the city-level classification of the original text, we used the SpaCy library in Python [34]
for the NER to keep posts that included Named Entities (NEs) that could imply, or infer, a
1
    www.academictorrents.com
2
    The notation ’/r’ indicates a subreddit on the platform.
Table 1
Average and macro testing results for the classification task using BERT for the three experiments
                                                                            Precision   Recall   F1-score
                                            Precision   Recall   F1-score
                                                                              (avg)     (avg)     (avg)
 Original text                                0.64       0.44      0.50       0.54       0.47        0.48
 Text filtered for location-inferring NER     0.68       0.54      0.58       0.61       0.57        0.57
 Text filtered for location specific NER      0.77       0.65      0.69       0.72       0.68        0.69


location, such as organizations, events, etc. The NER tags used and their meanings can be found
in the appendix A. After seeing an improvement of 0.08 in the overall F1-score, we pursued to
further filtering the posts to keep those with NEs related specifically to a geographic location,
namely ’GPE’ (Countries, cities, states) and ’LOC’ (Non-GPE locations, mountain ranges, bodies
of water). The results are summarized in table 1. There was a significant improvement in
the classification after filtering the posts for NEs that could potentially infer a location, and
another improvement after filtering for location specific NEs. It is important to mention that
the size of the dataset changed after filtering, which might have had an impact on the results.
The size of the training and testing data, and the full results per city for the three experiments
were added in appendixB and appendixC respectively. Some communities had more accurate
classification results in all of three experiments, namely large German cities such as Munich,
Berlin and Hamburg, as well as Las Vegas and Kanzas City achieving up to 70-80% accuracy
while other cities only around 50%. Interestingly, both Las Vegas and Kanzas City included an
aggregated subreddit ’_r4r’, which stands for Redditors for Redditors and facilitates
connections between users. These subreddits improved the classification results to a larger
extent than other topics such as food or music, and could be studied further in the future.


4. Conclusion and future work
In this paper, we presented the initial results for geoparsing experiments on recent Reddit
data. The analysis was done in a statistical manner without attempting to locate specific
users, adhering to ethical recommendations. We performed these experiments using BERT
classification on country and city levels. For countries, the classification worked near-perfectly
due to the influence of language. For cities, we also included experiments where NER was used
to detect posts that contain either location-inferring NEs, or location-specific NEs, showing
an improving classification accuracy. There are many influences that should be analyzed next
such as linguistic ambiguities, influences of specific NER tags and aggregated subreddits, and
extension to other countries. Geoparsing using Large Language Models (LLMs) could also be
compared to supervised classification, as well as utilizing KGs constructed from Gazeteers and
other sources, with or without the assistance of LLMs. Other future work could be on hierarchical
classification, using additional training data and different BERT models, and examining user
networks in the localized communities. Although working with Reddit data is challenging due
to the anonymous nature of the platform and the lack of explicit geolocation information, our
experiments demonstrate an interesting basis for further research on geoparsing.
References
 [1] X. Hu, Z. Zhou, H. Li, Y. Hu, F. Gu, J. Kersten, H. Fan, F. Klan, Location reference recognition
     from texts: A survey and comparison, ACM Computing Surveys 56 (2023) 1–37.
 [2] X. X. Zhu, Y. Wang, M. Kochupillai, M. Werner, M. Häberle, E. J. Hoffmann, H. Taubenböck,
     D. Tuia, A. Levering, N. Jacobs, et al., Geoinformation harvesting from social media data:
     A community remote sensing approach, IEEE Geoscience and Remote Sensing Magazine
     10 (2022) 150–180.
 [3] M. Häberle, M. Werner, X. X. Zhu, Geo-spatial text-mining from twitter–a feature space
     analysis with a view toward building classification in urban regions, European journal of
     remote sensing 52 (2019) 2–11.
 [4] A. Kruspe, Few-shot tweet detection in emerging disaster events, in: AI+HADR Workshop
     @ NeurIPS, 2019.
 [5] A. Kruspe, J. Kersten, F. Klan, Review article: Detection of actionable tweets in crisis
     events, Natural Hazards and Earth System Sciences (NHESS) Special Issue: Groundbreaking
     technologies, big data, and innovation for disaster risk modelling and reduction (2020).
 [6] S. Rode-Hasinger, M. Häberle, D. Racek, A. Kruspe, X. X. Zhu, TweEvent: A dataset of
     Twitter messages about events in the Ukraine conflict, in: Information Systems for Crisis
     Response and Management (ISCRAM), 2023.
 [7] S. Rode-Hasinger, A. Kruspe, X. X. Zhu, True or False? Detecting False Information on
     Social Media Using Graph Neural Networks, in: Proceedings of the Eighth Workshop
     on Noisy User-generated Text (W-NUT 2022), Association for Computational Linguistics,
     Gyeongju, Republic of Korea, 2022, pp. 222–229. URL: https://aclanthology.org/2022.wnut-1.
     24.
 [8] J. Niu, M. Stillman, P. Seeberger, A. Kruspe, A dataset of Open Source Intelligence (OSINT)
     Tweets about the Russo-Ukrainian war, in: Information Systems for Crisis Response and
     Management (ISCRAM), 2024.
 [9] Reddit.com, Reddit’s 2020 year in review, 2020. URL: https://www.redditinc.com/blog/
     reddits-2020-year-in-review.
[10] statista,    Statista.com,     2024. URL: https://www.statista.com/statistics/443332/
     reddit-monthly-visitors/imilarweb.com/website/reddit.com/#overview.
[11] SimilarWeb, Reddit.com, 2024. URL: https://www.similarweb.com/website/reddit.com/
     #overview.
[12] B. Han, A. Rahimi, L. Derczynski, T. Baldwin, Twitter geolocation prediction shared task
     of the 2016 workshop on noisy user-generated text, in: Proceedings of the 2nd Workshop
     on Noisy User-generated Text (WNUT), 2016, pp. 213–217.
[13] T. Julie, M. Sadouanouan, T. Yaya, A geolocation approach for tweets not explicitly
     georeferenced based on machine learning, in: International Symposium on Distributed
     Computing and Artificial Intelligence, Springer, 2023, pp. 223–231.
[14] I. Lourentzou, A. Morales, C. Zhai, Text-based geolocation prediction of social media users
     with neural networks, in: 2017 IEEE International Conference on Big Data (Big Data),
     IEEE, 2017, pp. 696–705.
[15] B. Huang, K. M. Carley, On predicting geolocation of tweets using convolutional neural
     networks, in: Social, Cultural, and Behavioral Modeling: 10th International Conference,
     SBP-BRiMS 2017, Washington, DC, USA, July 5-8, 2017, Proceedings 10, Springer, 2017, pp.
     281–291.
[16] T. Miyazaki, A. Rahimi, T. Cohn, T. Baldwin, Twitter geolocation using knowledge-based
     methods, in: Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on
     Noisy User-generated Text, 2018, pp. 7–16.
[17] F. Lovera, Y. Cardinale, D. Buscaldi, T. Charnois, A knowledge graph-based method for
     the geolocation of tweets, in: Workshop Proceedings of the 19th International Conference
     on Intelligent Environments (IE2023), IOS Press, 2023, pp. 53–62.
[18] B. Wing, J. Baldridge, Hierarchical discriminative classification for text-based geolocation,
     in: Proceedings of the 2014 conference on empirical methods in natural language processing
     (EMNLP), 2014, pp. 336–348.
[19] B. Huang, K. M. Carley, A hierarchical location prediction neural network for twitter user
     geolocation, arXiv preprint arXiv:1910.12941 (2019).
[20] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional
     transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018).
[21] L. F. Simanjuntak, R. Mahendra, E. Yulianti, We know you are living in bali: Location
     prediction of twitter users using bert language model, Big Data and Cognitive Computing
     6 (2022) 77.
[22] I. Salogni, Salogni at geolingit: Geolocalization by fine-tuning bert, in: Proceedings of the
     Eighth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian.
     Final Workshop (EVALITA 2023), CEUR. org, Parma, Italy, 2023.
[23] K. Lutsai, C. H. Lampert, Geolocation predicting of tweets using bert-based models, arXiv
     preprint arXiv:2303.07865 (2023).
[24] Y. Scherrer, N. Ljubešić, Helju@ vardial 2020: Social media variety geolocation with bert
     models, in: Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and
     Dialects, 2020, pp. 202–211.
[25] X. Hu, Z. Zhou, J. Kersten, M. Wiegmann, F. Klan, Gazpne2: A general and annotation-free
     place name extractor for microblogs fusing gazetteers and transformer models (2021).
[26] OpenStreetMap contributors, Planet dump retrieved from https://planet.osm.org , https:
     //www.openstreetmap.org, 2017.
[27] A. Kruspe, M. Häberle, E. J. Hoffmann, S. Rode-Hasinger, K. Abdulahhad, X. X. Zhu,
     Changes in twitter geolocations: Insights and suggestions for future usage, arXiv preprint
     arXiv:2108.12251 (2021).
[28] A. Kumar, J. P. Singh, N. P. Rana, Authenticity of geo-location and place name in tweets
     (2017).
[29] K. Harrigian, Geocoding without geotags: A text-based approach for reddit. arxiv, 2018.
[30] J. Baumgartner, S. Zannettou, B. Keegan, M. Squire, J. Blackburn, The pushshift reddit
     dataset, in: Proceedings of the international AAAI conference on web and social media,
     volume 14, 2020, pp. 830–839.
[31] H. Face, huggingface.co, 2024. URL: https://huggingface.co/bert-base-multilingual-cased.
[32] R. Winastwan, Text classification with bert in pytorch, 2021. URL: https://
     towardsdatascience.com/text-classification-with-bert-in-pytorch-887965e5820f.
[33] J. Libovickỳ, R. Rosa, A. Fraser, How language-neutral is multilingual bert?, arXiv preprint
     arXiv:1911.03310 (2019).
[34] M. Honnibal, I. Montani, spaCy 2: Natural language understanding with Bloom embed-
     dings, convolutional neural networks and incremental parsing, 2017. To appear.



A. Named Entity Recognition (NER) tags

Table 2
NER tags used during the city-level classification experiments.
              NER Tag            Description
              GPE                Countries, cities, states.
              LOC                Non-GPE locations, mountain ranges, bodies of water.
              FAC                Buildings, airports, highways, bridges, etc.
              ORG                Companies, agencies, institutions, etc.
              MONEY              Monetary values, including unit.
              NORP               Nationalities or religious or political groups.
              EVENT              Named hurricanes, battles, wars, sports events, etc.
              PRODUCT            Objects, vehicles, foods, etc. (Not services.)
              WORK_OF_ART        Titles of books, songs, etc.



B. Data volume

Table 3
Data volume for the training and testing data for the three experiments.
                                                          Training data    Testing data
               Original text (and Country-level)             114,530         45,365
               Text filtered for location inferable NER       83,288         33,174
               Text filtered for location specific NER        52,996         21,385
C. Extended results for the country-level and city-level
   experiments

Table 4
Results for the country-level classification on the testing data.
                                         precision    recall   f1-score    support
                        USA                 0.99       0.99         0.99   43240
                        Germany             0.79       0.79         0.79   2125
                        Weighted Avg
                                            0.98       0.98         0.98   45365
                        Accuracy




Figure 1: Normalized confusion matrix on original text of Reddit submissions.
Figure 2: Normalized confusion matrix on Reddit submissions after NER filtering for location-inferring
named entities.




Figure 3: Normalized confusion matrix after NER filtering for location-specific named entities.
Table 5
Results by city for the three experiments including precision, recall, F1-score and number of supporting
test samples.
                     Location NER                    Location-implying NER           Original text
 City                 P      R      F1    Support     P      R      F1    Support     P      R        F1    Support
 berlin              0.87   0.84   0.86     474      0.73   0.76   0.74     629      0.52   0.67     0.59     867
 frankfurt           0.94   0.77   0.84     142      0.92   0.67   0.77     165      0.86   0.58     0.69     192
 hamburg             0.86   0.76   0.81     188      0.72   0.79   0.75     219      0.66   0.71     0.68     257
 leipzig             1.00   0.66   0.79     111      1.00   0.55   0.71     133      1.00   0.53     0.69     143
 munich              0.70   0.86   0.77     371      0.69   0.76   0.72     470      0.65   0.72     0.69     535
 stuttgart           1.00   0.66   0.79     99       1.00   0.62   0.76     110      0.96   0.56     0.71     131
 albuquerque         0.44   0.59   0.50     248      0.49   0.46   0.48     437      0.52   0.36     0.43     601
 atlanta             0.81   0.74   0.77     424      0.61   0.57   0.59     672      0.50   0.46     0.48     938
 austin              0.58   0.66   0.62    1225      0.51   0.56   0.53    1990      0.39   0.49     0.43    2845
 bakersfield         0.98   0.56   0.71     102      0.97   0.44   0.61     152      0.97   0.32     0.48     217
 baltimore           0.74   0.61   0.67     303      0.60   0.56   0.58     452      0.72   0.45     0.55     564
 boston              0.81   0.78   0.79     848      0.77   0.66   0.71    1185      0.66   0.54     0.59    1460
 charlotte           0.63   0.68   0.65     623      0.52   0.55   0.54     972      0.25   0.51     0.34    1289
 chicago             0.60   0.77   0.67     851      0.50   0.65   0.57    1331      0.32   0.58     0.41    1897
 cleveland           0.80   0.62   0.70     352      0.79   0.53   0.63     489      0.75   0.42     0.54     615
 colorado springs    0.88   0.50   0.64     221      0.55   0.38   0.45     360      0.60   0.29     0.39     479
 columbia            0.94   0.44   0.60     104      0.45   0.60   0.52     219      0.48   0.52     0.50     307
 columbus            0.70   0.66   0.68     668      0.55   0.56   0.55     970      0.50   0.48     0.49    1245
 dallas              0.82   0.66   0.73     443      0.76   0.50   0.60     697      0.59   0.39     0.47     965
 denver              0.64   0.72   0.68     954      0.46   0.66   0.54    1512      0.43   0.53     0.48    2040
 el paso             0.90   0.54   0.68     101      0.90   0.42   0.57     134      0.90   0.32     0.47     171
 fortworth           0.90   0.46   0.61     158      0.90   0.34   0.49     249      0.86   0.24     0.38     345
 fresno              0.99   0.66   0.79     122      0.99   0.46   0.63     180      0.99   0.37     0.54     234
 indianapolis        0.63   0.55   0.59     245      0.56   0.40   0.47     443      0.67   0.32     0.43     586
 jacksonville        0.79   0.50   0.62     222      0.44   0.40   0.42     328      0.57   0.34     0.43     445
 kanzas city         0.76   0.80   0.78     218      0.71   0.82   0.76     537      0.68   0.78     0.73     993
 las vegas           0.89   0.85   0.87     638      0.78   0.78   0.78    1133      0.74   0.74     0.74    1723
 lincoln             0.88   0.41   0.56     109      0.81   0.31   0.45     179      0.81   0.26     0.39     219
 long beach          0.84   0.63   0.72     174      0.70   0.46   0.56     275      0.82   0.37     0.51     364
 los angeles         0.66   0.75   0.70     698      0.43   0.67   0.52    1042      0.33   0.52     0.40    1486
 louisville          0.74   0.54   0.62     350      0.79   0.42   0.55     518      0.58   0.35     0.44     659
 memphis             0.93   0.49   0.64     206      0.93   0.40   0.56     338      0.78   0.31     0.44     478
 miami               0.88   0.74   0.80     395      0.66   0.58   0.62     532      0.74   0.45     0.56     725
 milwaukee           0.93   0.60   0.73     310      0.81   0.53   0.64     447      0.77   0.47     0.58     541
 minneapolis         0.75   0.60   0.67     290      0.73   0.45   0.56     445      0.69   0.35     0.46     628
 nashville           0.88   0.67   0.76     400      0.63   0.51   0.57     606      0.56   0.40     0.47     835
 new orleans         0.49   0.51   0.50     434      0.25   0.52   0.34     760      0.53   0.38     0.44    1047
 new york            0.64   0.65   0.64     99       0.78   0.45   0.57     159      0.74   0.33     0.46     212
 oakland             0.89   0.61   0.72     216      0.89   0.49   0.63     296      0.88   0.36     0.51     376
 oklahoma            0.87   0.68   0.77     98       0.82   0.55   0.66     171      0.85   0.45     0.58     229
 omaha               0.87   0.63   0.73     221      0.84   0.47   0.60     349      0.83   0.34     0.48     483
 philadelphia        0.56   0.70   0.62     508      0.63   0.54   0.59     935      0.36   0.52     0.42    1473
 phoenix             0.79   0.78   0.79     417      0.70   0.58   0.63     650      0.57   0.47     0.52     884
 portland            0.63   0.66   0.64     426      0.51   0.55   0.53     757      0.36   0.41     0.38    1159
 raleigh             0.46   0.70   0.55     344      0.43   0.59   0.50     583      0.38   0.49     0.43     746
 sacramento          0.42   0.69   0.52     518      0.41   0.52   0.46     880      0.29   0.49     0.37    1236
 san antonio         0.81   0.51   0.62     321      0.63   0.40   0.49     559      0.65   0.27     0.38     825
 san diego           0.70   0.72   0.71     532      0.42   0.62   0.50     826      0.49   0.45     0.47    1179
 san francisco       0.53   0.69   0.60     526      0.61   0.56   0.58     833      0.48   0.50     0.49    1115
 san jose            0.85   0.64   0.73     271      0.78   0.50   0.61     406      0.65   0.41     0.51     527
 seattle             0.58   0.76   0.66    1125      0.51   0.64   0.57    1580      0.41   0.56     0.47    2014
 tampa               0.83   0.69   0.75     280      0.79   0.54   0.64     409      0.57   0.39     0.46     606
 tucson              0.96   0.60   0.74     253      0.95   0.45   0.61     408      0.83   0.35     0.49     581
 tulsa               0.90   0.62   0.74     194      0.87   0.43   0.58     308      0.88   0.33     0.48     404
 virginia            0.76   0.69   0.72     377      0.66   0.61   0.63     525      0.67   0.51     0.58     665
 washington          0.56   0.70   0.62     688      0.53   0.55   0.54    1019      0.51   0.47     0.49    1311
 wichita             0.96   0.53   0.68     150      0.97   0.43   0.59     211      0.95   0.32     0.48     274