BroDyn’18: Workshop on Analysis of Broad
                             Dynamic Topics over Social Media?

                         Tamer Elsayed1 , Walid Magdy2 , Mucahid Kutlu1 , Maram Hasanain1 , and
                                                    Reem Suwaileh1
                     1
                          Computer Science and Engineering Department, Qatar University, Doha, Qatar
                                    2
                                      School of Informatics, University of Edinburgh, UK
                            telsayed@qu.edu.qa, wmagdy@inf.ed.ac.uk, mucahidkutlu@qu.edu.qa,
                                   maram.hasanain@qu.edu.qa, reem.suwaileh@qu.edu.qa


                            Abstract. Social media streams are flooded with posts related to topics
                            that are broad (i.e., cover several sub-topics) and dynamic (i.e., develop
                            over time) which attract long-standing user interests. Posts about topics
                            like “Brexit” or “UK elections” are hard to miss any day while these top-
                            ics are “hot”, yet research on identifying and analyzing posts on such type
                            of topics is still in its infancy. The BroDyn workshop aims at building a
                            community interested in developing and exchanging ideas and methods
                            for analyzing social media for broad dynamic topics. It also aims at un-
                            derstanding the limitations of existing techniques in answering emerging
                            information needs for such topics, and proposing new techniques, eval-
                            uation methods, and test collections to address these limitations. The
                            workshop is designed to bring together audience at all levels, including
                            researchers from academia and industry as well as potential users, to cre-
                            ate a forum for discussing recent advances in this area. The workshop,
                            in its first version, featured three papers that span different aspects of
                            the target research area.


                 1        Introduction

                 For long time, information retrieval (IR) research has mostly focused on infor-
                 mation needs that are short-term and narrow in scope. That was manifested in
                 a large body of work on the traditional ad-hoc search task [3], which naturally
                 belongs to that type. Although tasks such as information filtering and topic
                 tracking (which are both long-term by definition) have earlier attracted IR re-
                 searchers [1, 16], mostly on news streams, the limited volume and low frequency
                 of documents have limited the scope of information needs.
                     Since the emergence of social media platforms, new scenarios of IR were re-
                 quired due to the shift in the information needs of users on those platforms [13,
                 19]. That indeed motivated the proposal of new tasks in the IR research commu-
                 nity represented in the TREC Microblog track [15] and its subsequent versions [8,
                 9]. Several attempts for modeling practical IR tasks were introduced through the
                 ?
                     https://sites.google.com/view/brodyn2018/


BroDyn 2018: 1st Workshop on Analysis of Broad Dynamic Topics over Social Media @ ECIR 2018, Grenoble, France   1
tracks between 2011 and 2016, e.g., tweet filtering [18], tweet timeline genera-
tion [7], and real-time summarization [9].

    Over time, users developed a widespread perception of social media as a news
source that they follow (almost all day long) to get updated on their topics of
interest. Many of those topics are of long-term interest (i.e., span or stay active
for long period of time), very broad (i.e., cover many aspects or sub-topics),
and dynamic (i.e., develop and change focus over time). For example, following
tweets related to a topic such as “UK Elections” requires tracking posts about
several sub-topics (such as candidates, campaigns, political views, election pro-
cess, etc.) for a long period of time [6]. Moreover, the sub-topics change over
time, e.g.,“debates” is an important one before the elections, while “election re-
sults” is the most important one during voting; and a sub-topic sometimes spans
a very short period of time, e.g., press statements by candidates and leaked con-
tent about them. Such kind of broad and dynamic topics can be running for few
weeks, such as crisis events (e.g., “Irma hurricane”) [14], or up to years, such as
“the Syrian conflict”. Other examples of broad dynamic topics include“Refugees
in Europe”, “GCC crisis”, “Brexit”, “North Korea and US conflict” to name a
few. The diversity of such topics and the importance of meeting the needs of
following and analyzing them warrant a matching focus from the IR research
community [10, 11].

    Analyzing broad dynamic topics over social media has several research chal-
lenges. Systems that analyze social media for that type of topics require adaptive
techniques to effectively capture the different and changing aspects of the topics.
They also have to be scalable to cope with the large volume of posts and the
diversity of the sub-topics, real-time to be responsive to the high velocity of the
data, and reliable to perform effectively over long periods of time. The process
might indeed cover several steps including retrieval and filtering, topic/sub-topic
detection, topic modeling, and summarization among many. Advanced spam and
bot detection techniques might also be required to tackle the changes in content
and techniques of spammers. Furthermore, a new evaluation framework for such
domain is also needed with novel evaluation measures that capture the nature
of topics (and thus systems), and new large reusable datasets that enable the
researchers to run meaningful and representative experiments.

    To advance the research work in that area, we organized the BroDyn work-
shop. BroDyn aims to engage with the IR community interested in different
technologies applied to social media such as filtering, summarization, spam de-
tection, but focusing on broad and dynamic topics. Moreover, the theme of the
workshop concerns, besides IR researchers, a wide spectrum of potential users
of the needed technology, such as journalists, historians, politicians, social scien-
tists and analysts [17]. This shows the potential impact of the research needed
in this area. While there have been many workshops on social media, to the best
of our knowledge, this workshop is the first that focuses on broad and dynamic
topics in that domain.


                                                                                        2
2   Objectives
The BroDyn workshop aims to achieve the following objectives:
1. Directing the attention of IR researchers to the new emerging type of infor-
   mation needs that require following broad and dynamic topics and events
   over social media streams.
2. Better understanding the limitations of our current methods and inspiring
   research on new algorithms and techniques.
3. Encouraging the design of new evaluation measures, datasets, and test col-
   lections to support the research in that area.
4. Forming a new community that brings together IR researchers as well as
   potential users who are interested in the domain.


3   Topics of Interest
To reflect the large scope of work, we encouraged submissions that span the
spectrum from retrieval and filtering to recommendation and summarization.
Our workshop solicited contributions on all topics related to the theme, focused
(but not limited to) on the following tasks:
 – Adaptive high-recall high-precision filtering / topic tracking
 – Adaptive summarization
 – Topic/sub-topic or event/sub-event detection over time
 – Retrospective generation of timelines
 – Following controversial political events/crises: identification of decision mak-
   ers, credibility/information source finding, stance/opinion mining, troll de-
   tection, fact checking
 – Cross-media filtering (i.e., over heterogeneous sources)
 – Multilingual topic/event detection
 – Online/dynamic topic modeling
 – Real-time/scalable techniques of processing high-volume streams
 – Evaluation techniques and novel test collections (specific for broad dynamic
   topics)
 – Spam and hashtag-spam detection
 – Bot and automatically-generated content detection
 – Data visualization
 – Recommendation (e.g., of hashtags/topics/sub-topics)
 – Learning techniques/deep learning over social streams

Example Datasets
We encouraged (but not required) submissions describing experiments using two
datasets, GE2017 [5] and USPresElect2016 [6], as examples of datasets on broad
dynamic topics. Researchers were free to define their own relevant tasks using
the datasets if they elect to use any of them.


                                                                                      3
    GE2017 is a dataset of around 18M tweets collected between April 28th and
June 8th 2017 on the British General Elections 2017. A set of 56 keywords related
to GE2017 was used to collect tweets on the topic. The Twitter streaming API
was used to retrieve tweets containing any of these keywords over the period
of study. The keywords consist of hashtags, accounts, and terms representing
phrases on the elections (e.g. #GE2017, general elections), politicians involved
in the elections (Theresa May, Corbyn, #jc4pm), and related topics (e.g., Brexit,
NHS). Due to the restrictions of tweets redistribution, we only shared the tweet
IDs of the dataset.
    USPresElect2016 is a dataset of 3,450 labelled tweets representing the top
50 most retweeted tweets on the US presidential elections 2016 for every day
during the period from 1 Sep 2016 to 8 Nov 2016 (the election day). The total
number of retweets for these 3,450 tweets are over 26M times. Each tweet is
labeled as: support/attack Trump/Clinton, or both, or neither (neutral).


4   Overview of Accepted Papers

We have three research papers accepted in our first workshop on broad and
dynamic topics, two full and one short, covering different aspects of the main
theme of the workshop.
    Badache et al. [2] developed a system for detecting the intensity of contradict-
ing views for a particular topic. They also introduced a measure for contradiction
intensity and a dataset built by using reviews and courses in Coursera. Detect-
ing intensity of contradicting reviews is particularly important for the analysis of
broad and dynamic topics because it is very likely to have contradicting opinions
in broad topics (e.g., supporting and opposing views about Brexit) and detecting
the intensity of contradicting views help better understand the public opinion
about a particular topic.
    Mazoyer et al. [12] proposed two approaches to collect tweets discussing news
in the French media. The first approach iteratively modifies the query sent to
Twitter API to form a vocabulary-constrained collection. The other approach
collects random tweets and dynamically clusters them in events. The approaches
were designed such that the collected tweet datasets are representative of the
true tweets activity on Twitter. Their approaches can be useful to analyze the
activity in social media about news events in real-time.
    Bulbul et al. [4] presented an approach to collect Twitter accounts of refugees
residing in Turkey. The dataset covers tweets since Syrian Crisis started, allowing
us to analyze how the opinions and feelings of Syrian refugees changed over
time. Furthermore, the paper described an initial analysis of the topics covered
by the tweets posted through these accounts. Acquiring such dataset can help
in conducting social studies on the crisis of refugees or even acquire actionable
knowledge to better understand and fulfill their needs.
    Overall, the accepted papers provide novel methods to have better insights
about broad and dynamic topics and construct datasets for further analysis.


                                                                                       4
Being the first workshop of its kind, we believe that the accepted papers will
pave the way for further development in this emerging research area.


5     Program

BroDyn was held on March 26, 2018 in Grenoble, France in conjunction with
the 40th European Conference on Information Retrieval (ECIR’18). The pro-
gram started with the keynote speech given by Michalis Vazirgiannis featuring
new techniques for event detection over social media. The speech was followed
by three presentations of the accepted papers. We allowed 30 minutes for pre-
senting full papers and 20 minutes for the short paper. After each presentation,
we moderated a discussion for 10 minutes. Once all papers are presented, we
also moderated an open discussion session in which we discussed the current
challenges and future direction for research on broad and dynamic topics. The
detailed program of the workshop is given below.

2:30 - 2:35     Opening
2:35 - 3:35     (Keynote Speech) Graph-Based Event Detection in
                Streams: The Twitter Case
                Michalis Vazirgiannis
3:35 - 4:00     Social Media based Analysis of Refugees in Turkey
                Abdullah Bulbul, Salah Haj Ismail, Çağr Kaplan
4:00 - 4:30     Coffee Break
4:30 - 5:10     Contradiction in Reviews: is it Strong or Low?
                Ismail Badache, Sebastien Fournier, Adrian-Gabriel Chifu
5:10 - 5:50     Real-time Collection of Reliable and Representative
                Tweets Datasets Related to News Events
                Béatrice Mazoyer, Céline Hudelot, Marie-Luce Viaud, Julia Cagé
5:50 - 6:20     Open Discussion


6     Reviewing Process and Program Committee

All submitted papers were peer-reviewed through a double-blind reviewing pro-
cess by at least three program committee members3 . We would like to deeply
thank all members of the committee for their great work. The committee consists
of the following members:

 – Dyaa Albakour, Signal Media
 – Mossaab Bagdouri, Walmart Labs
 – Mohand Boughanem, IRIT University Paul Sabatier Toulouse
 – Fabio Crestani, University of Lugano (USI)
3
    a meta review by the workshop chair was also added in some cases.


                                                                                    5
 – Kareem Darwish, Qatar Computing Research Institute
 – Hui Fang, University of Delaware
 – Saptarshi Ghosh, Indian Institute of Technology Kharagpur
 – Maram Hasanain, Qatar University
 – Gareth Jones, Dublin City University
 – Andreas Kaltenbrunner, NTENT
 – Preslav Nakov, Qatar Computing Research Institute
 – Lynda Tamine, University of Toulouse
 – Ingmar Weber, Qatar Computing Research Institute
 – Peilin Yang, University of Delaware


7   Organizing Committee

The workshop organizing committee has two chairs who are responsible for the
managing the reviewing process, planning the program, and developing proceed-
ings: Tamer Elsayed (assistant professor of Computer Science at Qatar Univer-
sity) and Walid Magdy (lecturer at the School of Informatics at the University
of Edinburgh). It also has three members who are responsible for publicity (web-
site, social media, mailing lists, etc.) and helping with developing the proceed-
ings: Mucahid Kutlu (post-doctorate researcher at Qatar University), Maram
Hasanain (Computer Science PhD candidate at Qatar University), and Reem
Suwaileh (MSc student and research assistant at Qatar University).
    Tamer Elsayed received his PhD in Computer Science from the University
of Maryland, College Park. His main research interests are information retrieval,
text mining, and big data analytics. He has over 50 publications in top-tier
journals (e.g., JASIST and IP&M) and conferences (e.g., SIGIR and ICWSM).
He received two best paper awards at AIRS 2015 and HCOMP 2016 conferences.
He is a member of the editorial board of IP&M Elsevier journal and served as
a PC member in top IR conferences (e.g., ACM SIGIR and ACM CIKM). His
research team has regular participation in microblog track at TREC since 2011,
including ad-hoc search, filtering, and real-time summarization tasks.
    Walid Magdy received his PhD from the School of Computing at Dublin
City University. His main research interests include computational social sci-
ence, information retrieval, and data mining. Before joining UoE in 2016, He
worked for about five years as a scientist at Qatar Computing Research Insti-
tute (QCRI). He also worked at his early career for IBM and Microsoft as a
research engineer between 2005 and 2008. He has over 60 publications in top-
tier conferences and journals, in addition to 9 patents filed under his name. Some
of his work was featured in popular press, such as CNN, BBC, Washington Post,
the Independent, Daily Mail, and Mirror.
    Mucahid Kutlu received his PhD in Computer Science and Engineering
from Ohio State University. His main research interests are big data and in-
formation retrieval with emphasis on evaluation and more specifically building
scalable test collections over the Web and social media.


                                                                                     6
   Maram Hasanain’s research interests include information retrieval over
tweets with special focus on Arabic text. Her work focuses on problems related
to evaluation, ad-hoc search, filtering, summarization and question answering.
Her publications appeared at top conferences and journals and she served as a
reviewer for several of them.
    Reem Suwaileh’s research area is information retrieval with emphasis on
topic tracking and summarization over tweets. She has been a regular member
of Qatar University team participating at TREC since 2015, where her team was
ranked second in 2015 and first in 2016 in real-time summarization track.


Acknowledgments

This work was made possible by NPRP grant# NPRP 7-1313-1-245 from the
Qatar National Research Fund (a member of Qatar Foundation). The statements
made herein are solely the responsibility of the authors.


References

 1. Allan, J., Carbonell, J.G., Doddington, G., Yamron, J., Yang, Y.: Topic detection
    and tracking pilot study final report (1998)
 2. Badache, I., Fournier, S., Chifu, A.G.: Contradiction in reviews: is it strong or low?
    In: Proceedings of the first International Workshop on Analysis of Broad Dynamic
    Topics over Social Media: BroDyn’18 (2018)
 3. Baeza-Yates, R., Ribeiro-Neto, B., et al.: Modern information retrieval, vol. 463.
    ACM press New York (1999)
 4. Bulbul, A., Kaplan, C., Haj Ismail, S.: Social media based analysis of refugees in
    turkey. In: Proceedings of the first International Workshop on Analysis of Broad
    Dynamic Topics over Social Media: BroDyn’18 (2018)
 5. Cram, L., Llewellyn, C., Hill, R.L., Magdy, W.: UK general election 2017: a twitter
    analysis. CoRR abs/1706.02271 (2017), http://arxiv.org/abs/1706.02271
 6. Darwish, K., Magdy, W., Zanouda, T.: Trump vs. hillary: What went viral during
    the 2016 us presidential election. In: Proceedings of the 9th International Confer-
    ence on Social Informatics (SocInfo’17). pp. 143–161. Springer (2017)
 7. Lin, J., Efron, M., Wang, Y., Sherman, G.: Overview of the trec-2014 microblog
    track. In: Proceedings of the 23rd Text REtrieval Conference. TREC ’14 (2014)
 8. Lin, J., Efron, M., Wang, Y., Sherman, G., Voorhees, E.: Overview of the trec-2015
    microblog track. In: Proceedings of the 24th Text REtrieval Conference. TREC ’15
    (2015)
 9. Lin, J., Roegiest, A., Tan, L., Richard, M., Voorhees, E., Diaz, F.: Overview of
    the trec 2016 real-time summarization track. In: Proceedings of the 25th Text
    REtrieval Conference. TREC ’16 (2016)
10. Magdy, W., Elsayed, T.: Adaptive method for following dynamic topics on twitter.
    In: Proceedings of the Eighth International AAAI Conference on Weblogs and
    Social Media (ICWSM) (2014)
11. Magdy, W., Elsayed, T.: Unsupervised adaptive microblog filtering for broad dy-
    namic topics. Information Processing & Management 52(4), 513–528 (2016)


                                                                                             7
12. Mazoyer, B., Cagé, J., Hudelot, C., Viaud, M.L.: Real-time collection of reliable
    and representative tweets datasets related to news events. In: Proceedings of the
    first International Workshop on Analysis of Broad Dynamic Topics over Social
    Media: BroDyn’18 (2018)
13. Morris, M.R., Teevan, J., Katrina, P.: A comparison of information seeking using
    search engines and social networks. In: Proceedings of the Fourth International
    AAAI Conference on Weblogs and Social Media. pp. 291–294 (2010)
14. Olteanu, A., Castillo, C., Diaz, F., Vieweg, S.: Crisislex: A lexicon for collecting
    and filtering microblogged communications in crises. In: Proceedings of the Eighth
    International AAAI Conference on Weblogs and Social Media (ICWSM) (2014)
15. Ounis, I., Macdonald, C., Lin, J., Soboroff, I.: Overview of the trec-2011 microblog
    track. In: Proceedings of the 20th Text REtrieval Conference. TREC ’11 (2011)
16. Robertson, S.E., Soboroff, I.: The trec 2002 filtering track report. In: TREC. vol.
    2002, p. 5 (2002)
17. Ruths, D., Pfeffer, J.: Social media for large studies of behavior. Science 346(6213),
    1063–1064 (2014)
18. Soboroff, I., Ounis, I., Lin, J., Macdonald, C.: Overview of the trec-2011 microblog
    track. In: Proceedings of the 21st Text REtrieval Conference. TREC ’12 (2012)
19. Teevan, J., Ramage, D., Morris, M.R.: # twittersearch: a comparison of microblog
    search and web search. In: Proceedings of the fourth ACM international conference
    on Web search and data mining. pp. 35–44. ACM (2011)


                                                                                             8