=Paper=
{{Paper
|id=Vol-1670/paper-52
|storemode=property
|title=None
|pdfUrl=https://ceur-ws.org/Vol-1670/paper-52.pdf
|volume=Vol-1670
}}
==None==
Scalable Detection of Emerging Topics and Geo-spatial Events in Large Textual Streams Erich Schubert, Michael Weiler, and Hans-Peter Kriegel Institut für Informatik, LMU Munich, Germany {schube,weiler,kriegel}@dbs.ifi.lmu.de Social media are a popular source for live textual data. This data poses several challenges due to its size, velocity, and heterogeneity. Existing methods for emerging topic detection often are only able to detect events of a global magnitude such as natural disasters, or they can only monitor user-selected keywords or a curated set of hashtags. Interesting emerging topics may, however, be of much smaller magnitude and may involve the combination of two or more words that are not yet known in beforehand. We present several contributions introduced in previous work [1, 2]: (i) A signicance measure that can detect emerging topics early, long before they evolve into hot tags, by drawing upon experience from outlier detection. (ii) An ecient online algorithm to track these statistics for all words and word- pairs with only a xed amount of memory, and without predened keywords. (iii) The clustering of the detected co-trends into larger topics, because a single event will cause multiple word combinations to trend at the same time. (iv) How to incorporate location information into this process to both allow re- porting the locality of events as well as detecting local-only geo-textual patterns. The signicance score provides an estimated frequency and standard devia- tion of words, word-pairs, and word-location information on the data stream at minimal cost. It allows for normalization across location, culture, and language and enables the detection of change events both in already frequent and not previously seen combinations. In contrast to earlier work, it can monitor every word at every location with only a xed amount of memory, compare the val- ues to statistics from earlier data, and immediately report signicant deviations with minimal delay. The algorithm is capable of reporting Breaking News in real-time as they happen in social media around the world. Location is modeled at dierent granularities, such that events can be detected at a city, country, or global level by incorporating OpenStreetMap data, or at particular coordinates. References [1] E. Schubert, M. Weiler, and H.-P. Kriegel. SigniTrend: Scalable Detection of Emerging Topics in Textual Streams by Hashed Signicance Thresholds . In:Proc. 20th ACM Int. Conf. on Knowledge Discovery and Data Mining (SIGKDD). 2014, pp. 871880. [2] E. Schubert, M. Weiler, and H.-P. Kriegel. SPOTHOT: Scalable Detection Proc. 28th Int. Conf. of Geo-spatial Events in Large Textual Streams . In: on Scientic and Statistical Database Management (SSDBM). 2016.