<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Unlock the Stock: User Topic Modeling for Stock Market Analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Patrick Siehndel</string-name>
          <email>siehndel@L3S.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ujwal Gadiraju</string-name>
          <email>gadiraju@L3S.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>L3S Research Center, Leibniz Universität Hannover</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The increasing use of Twitter as a medium for sharing news related to various topics, facilitates methods for automatic news creation or event detection and prediction. However, these methods are hindered by users posting and propagating incorrect or irrelevant content. Choosing the right users is crucial in order to sample down the tweets to be analyzed, and preserve the quality of the predicted events or generated news. In this paper, we present an e ective method for identifying expert users in defined areas related to the stock market. For each user we generate a model based on the content of their posts. The model represents the domains the user talks about, and allows a selection of users for various tasks. We show the e ectiveness of the proposed approach by performing a series of experiments using large Twitter datasets related to Stock Market Companies.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Categories and Subject Descriptors</title>
      <p>H.3.1 [INFORMATION STORAGE AND RETRIEVAL]:
Content Analysis and Indexing; H.3.3 [INFORMATION STORAGE
AND RETRIEVAL]: Information Search and Retrieval</p>
    </sec>
    <sec id="sec-2">
      <title>1. INTRODUCTION</title>
      <p>
        Today a large amount of data is user generated content produced
within social media networks like Twitter or Facebook. The
availability and abundance of this kind of data has lead to various
innovative real world applications. Harnessing sentiments and opinions
that characterize political landscapes, using geolocation of users
for analyzing earthquakes [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], or investigating social networks for
predicting disease outbreaks and spread [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] are a few examples that
rely on the analysis of social media data.
      </p>
      <p>One of the major challenges when working with social media data,
is the fact that users in the network are not homogeneous. While for
some applications this might be very useful (for example, to access
multiple standpoints on subjective matters from di erent groups of
persons), in some others it is necessary to only consider the posts
from certain user groups. In this paper, we present an approach
towards identifying groups of users that are of particular interest
in the realm of stock market analysis. Our method is based on
the use of background information from Wikipedia, allowing us to
generate profiles which take into account the semantic background
of messages posted by the user. Additionally, our method allows
to create connections between the user profiles and di erent
areas of the stock market represented by a hierarchical model. For
generating the user models we consider the textual content of the
messages written by the users as well as the available metadata for
each user. By applying named entity recognition on the provided
posts we generate semantically enriched messages. These are
aggregated into a user profile, representing the topics and fields a user
writes about as well as an estimation of the user’s expertise in the
field. In addition, these profiles also represent how trustworthy the
messages of a user in certain domains are.</p>
      <p>
        The usefulness of the detection of experts in social networks
related to stock markets has been analyzed by Bar-Haim et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
The authors show that training models to distinguish expert and
non-expert users can improve the quality of predicted stock market
changes. Our work here, is motivated further by this premise.
We test and evaluate the proposed method on a large dataset
consisting of tweets related to the stock market. Our experiments show
that the right composition of users can help to increase the quality
and e ectiveness of event prediction algorithms as well as the
overall analysis of messages related to certain topics. We also show that
this reduces the number of messages that need to be analyzed.
The remainder of this paper is structured as follows. In Section 2
we review the related work in the areas of topic modeling and user
expertise analysis. In Section 3 we describe in detail the
manifestation of complete user profiles from simple textual messages. In the
penultimate Section 4, we describe a series of experiments that test
the e ectiveness of the proposed methods. In the last Section we
draw conclusions, discuss the implications of our work and present
an outlook of our imminent future work.
      </p>
    </sec>
    <sec id="sec-3">
      <title>2. RELATED WORK</title>
      <p>The related work in our area is split into two di erent directions:
(i) the area of expert detection, and (ii) the domain of user and topic
modeling in Twitter.</p>
    </sec>
    <sec id="sec-4">
      <title>2.1 Expert Detection</title>
      <p>
        In the area of expert detection the work conducted by Guy et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]
describes a scenario where users within an enterprise network are
analyzed. The scores assigned to users representing their interest
and expertise are based on search terms and documents within an
index. In contrast to our work, the connection between users and
areas of expertise is made by keyword matching. In our work, the
semantic relation between entities related to the topic of interest
and the entities mentioned by the users is used for generating the
relations. Our focus on expertise networks in Twitter also fits in
the research area of automated expert finding. Here both explicit
and implicit information is used for identifying experts in a
particular area. Yimam-Seid and Kobsa [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] argue that for an e ective
use of knowledge within an organization, it is important to use
hidden knowledge in various forms. The authors separate the need
for information (need for people who can provide advice, help or
feedback) from the need for expertise (the need for people who
can perform a social or organizational role). Ghosh et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] used
Twitter content for seeking experts on a topic. Their results show
that the use of Twitter Lists describes the expertise of a user more
accurately than systems that rely on merely tweet content.
      </p>
    </sec>
    <sec id="sec-5">
      <title>2.2 User and Topic Modeling in Twitter</title>
      <p>
        Abel et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] compared di erent approaches for extracting
professional interests from social media profiles. They showed that tag
based profiles and self-created user profiles are most suitable for
this task. Recent work [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] has shown how user trust models can be
used for increasing the performance of event detection methods on
Twitter by using textual features and meta-information about the
user. Based on the user profile a classifier decides whether to take
messages from these users into account or discard them. This
application is relatively close to the one analyzed in this paper. The
main di erences lie in the use of word vectors (wherein we use
detected entities in contrast) and the focus on the trustworthiness of a
user, while we focus on areas of interest and keep users based on
the amount of related information they posted in the past.
Besides the analysis of textual patterns and user information several
works also consider the structural information inherent in social
networks like Twitter. The task of finding content of high quality
has also been analyzed in domains like question answering
communities [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] or forums [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] which are comparable in several
dimensions as they are also social networks. The use of standard
authority estimation approaches within the network has also been
used to gather experts for specific topics [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. In [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], Chelaru et al.
analyzed how di erent user groups are connected to each other and
showed that within professional networks groups centered around
di erent skills exist. For our analysis these groups are interesting
since the user in these groups can be considered to be experts for
the specific domain.
      </p>
    </sec>
    <sec id="sec-6">
      <title>3. METHODOLOGY</title>
      <p>
        In this Section we describe how the creation of user sets for
defined topics is carried out. We select special groups of users with
the main goal of reducing the content which is not of interest for
our domain; the main topics with relevance for Systemic Risk and
Stock market analysis are only a small fraction of the messages
posted by most of the users. Additionally many users have di
erent purposes in mind when using social media [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Nevertheless
Twitter contains a large portion of news related content [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] which
is of interest for generating topic specific user models within the
analyzed domain.
      </p>
      <p>Our user model describes the topics a user writes about based on
the Thomas Reuters Business Classification System (TRBC)1. This
System contains a hierarchy for di erent business sectors, allowing
1http://thomsonreuters.com/en/products-services/financial/marketindices/business-classification.html
us to model the interests and expertise of the monitored users in a
way, which directly describes and partitions the di erent sectors of
the stock market. In our profile we model the Economic sectors,
Business sectors and Industries to corresponding Wikipedia
categories. These di erent categories contain various aspects of the
stock market in di erent granularities. For instance, the Economic
Sector Basic Material is split into Business Sectors like Chemicals
and Mineral Resources. These are further grouped into industries
like Agricultural Chemicals or Steel. Our aggregated user profiles
together with the di erent levels of granularity allows us to find and
monitor users for various domains.</p>
      <p>
        In our scenario we are looking for expert users within specific
domains. An expert user can be described as a person who has a
deeper knowledge regarding a certain domain than the average user.
In particular we choose the users related to a domain by first
annotating the messages posted by the user, and then calculating their
relevance to the defined domain of interest. The models and
computed connections are comparable to the methods presented in [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]
and [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], albeit without the need for performing graph walking or
traversal algorithms. Our approach for the creation of a user profile
consists of the following 4 main steps.
      </p>
      <sec id="sec-6-1">
        <title>Topic selection Message enrichment Entity linking User finalization</title>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>3.1 Creating User Profiles</title>
      <p>Topic selection The Thomson Reuters Business Classification
System builds the base for the topics of the generated profiles, it
contains a hierarchy for di erent business sectors, allowing us to model
the interests and expertise of the monitored users in a way that
indicates their relatedness to companies of the di erent stock
market sectors. We linked the Economic sectors, Business sectors
and Industries of the TRBC to corresponding Wikipedia categories.
Based on these Wikipedia categories the users interests are
described. As a result, our profile only indicates the relatedness of
the user to di erent sectors of the stock market. These user profiles
together with the di erent levels of granularity allows us to find and
monitor users for various scenarios.</p>
      <p>
        Message enrichment In order to relate the messages written by
the analyzed users we annotate all tweets of the user using the
Wikipedia Miner Toolkit2 [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. We use named entity recognition,
which provides us relations between entities or concepts mentioned
in the tweets of a user to the corresponding Wikipedia article. The
links discovered by Wikipedia Miner have a similar style to the
links which can be found inside a Wikipedia article. Not all words
which have a related article in Wikipedia are used as links, but only
words which are relevant for the whole topic are used as links.
Figure 1 shows an example tweet with the related Wikipedia articles
which are used to build the user profile.
      </p>
      <p>
        Entity linking: The third stage, entity linking, relates the entities
that have been mentioned in the users tweets to the categories
chosen beforehand representing the di erent sectors of the stock
market. For each of the entities we calculate the relatedness to every
article belonging to the chosen categories. As relatedness
measure we use the one proposed by Milne et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], this measure is
modeled based on the normalized Google distance measure [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and
      </p>
      <sec id="sec-7-1">
        <title>2http://wikipedia-miner.cms.waikato.ac.nz/</title>
        <p>calculates the ratio between the inlinks two articles have in
common and the overall articles linking to them. In order to reduce the
influence of articles being mentioned several times we used the log
of the number of articles as a weight for the profiles. The formula
for the relatedness of two articles() is defined as shown below.</p>
        <p>R(a; b) =
log(max(jAj; jBj)) log(jABj)
log(jWj) log(min(jAj; jBj))
User finalization: In the final stage, we perform an aggregation
over all of the tweets and related entities of a user in order to
generate the final user profile. The generated user profile displays the
topics a user talks about, based on the amount of followers or how
focused a user is in a certain topic. We also get an estimation about
the expertise of the user in this domain.</p>
        <p>Algorithm 1 depicts the steps in the creation of a user profile. In
line 10 we perform the main update of the profile using the related
category and the calculated relatedness.</p>
        <p>The several loops in the algorithm require some time for
calculation, to speed up the process our implementation contains a bu er
storing the weights for all categories for every entity. This reduces
the three inner loops to a single lookup in a hashmap, making the
method applicable for very large sets of users and tweets, making
our approach presented here scalable.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>4. EXPERIMENTS FOR USER MODELING</title>
      <p>In this section we describe the experimental evaluation performed
for analyzing our generated user models. The experimental
evaluation is a two-fold approach. In a first set of experiments we
evaluate the generated user models as they are, without correlating them
with additional information. This set of experiments gives us an
overview of how the analyzed users behave in general and how
their topics of interest influence their position within the network.
We also evaluate the temporal stability of the generated profiles in
order to give recommendations for future use of the proposed
methods. In the second set of experiments, we evaluate the usefulness
of the generated profiles in a series of experiments, for applications
related to event detection. We show how the generated profiles
influence the results.
4.1</p>
    </sec>
    <sec id="sec-9">
      <title>Evaluating the Users</title>
      <sec id="sec-9-1">
        <title>Algorithm 1: Update of User Profiles</title>
        <p>Input: T : Set of Tweets of a User C: Set of Categories for the</p>
        <p>Profile
1 foreach ck 2 C do
2 ILC (ck; getInlinks(ck))
3 foreach Ti 2 T do
4 Ai getAnnotations(Ti)</p>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>5 foreach a j 2 Ai do</title>
      <p>6 ILAi getInlinks(a j)</p>
    </sec>
    <sec id="sec-11">
      <title>7 foreach ilak 2 ILAi do</title>
    </sec>
    <sec id="sec-12">
      <title>8 foreach ilck 2 ILC do</title>
      <p>9 R calulateRelatedness(ilck; a j)
10 updatePro f ile(ilck; R)
The first set of experiments focuses on the evaluation of methods
described for modeling user interest and expertise, and the resulting
user profiles. In this section we aim to answer 2 main research
questions.</p>
      <p>RQ#1. How broad are the topics that users talk about
and how are these topics interconnected? We analyze if
there are major di erences between users regarding the
variance of the topics they tweet about, or more precisely how
focused the users are with respect to a certain topic. We also
evaluate how the di erent topics within the evaluated realm
are connected.</p>
      <p>RQ#2. How stable are the generated user profiles over
time? With the availability of large amounts of users, it
becomes crucial to reduce the time for recalculating or
updating the user profiles with every new post. With this question
we want to answer how stable user profiles are over time,
allowing us to estimate the best time spans for periodically
updating the profiles.</p>
    </sec>
    <sec id="sec-13">
      <title>4.2 Dataset</title>
      <p>The dataset gathered for our experiments consists of around 5:3
million tweets related to more than 3000 Stocks mainly listed at
the NYSE. These tweets were collected using the Twitter Streaming
API together with a set of filters. The filters were selected based on
the stock symbols of the di erent companies (for instance, $AAPL
for Apple). The idea behind this approach is to collect a series of
tweets which contain a direct relation to a stock market company
and are thereby of high interest to our domain. The collected tweets
were posted by 333,704 di erent users who posted more than 2.4
billion tweets overall (see Table 1).</p>
      <p>Out of this large amount of users we randomly selected a set of
10,000 active users (i.e., users with &gt;100 followers and &gt;100 tweets)
for further analysis. For each of the users we downloaded up to
2,400 of the most recent tweets using the Twitter API resulting in a
dataset of 12,308,376 tweets. The tweets were annotated using the
described method resulting in a set of more than 50 million
annotations. The user profiles were generated based on these annotations.</p>
    </sec>
    <sec id="sec-14">
      <title>4.3 Topics and Expertise</title>
      <p>In Table 2 we show how much the top 50 experts of the di erent
domains are focused on their topics. The Table shows the
percentage of the weights of the top categories within the profile. We can
see that domain experts in the area of Pharmacology are very
focused on this topic. As expected the domain experts show a strong
focus towards their area of expertise when compared to an average
user.</p>
      <p>Additionally, we analyzed how the topics the users talk about are
connected to each other, as shown in Figure 2. The diagram shows
the Pearson correlation between the di erent industries based on
the user profiles. We can see some strong correlations within the
di erent sectors. This is expected since the topics of industries
like Computer Hardware and Computer Software are very related.
It is interesting to see that there are also other relations between
some of the sectors. For instance, the industries around healthcare
are also connected to the industries around food and chemistry, or
the industries around computer hardware show some connections
to entertainment. These connections can help finding users who
possess some domain knowledge in a certain area even though their
posting behavior does not indicate this.</p>
    </sec>
    <sec id="sec-15">
      <title>4.4 Temporal Aspects and Stability of Profiles</title>
      <p>Finally, we focus on temporal aspects of the profiles and analyze
how stable the profiles are over time. This allows us to estimate
the required update interval for di erent user groups. In order to
understand and analyze the evolution of user profiles, we consider
the top-100 users for every domain. Then, we compared the
generated profiles based on the first and the second half of the collected
tweets from these users. We calculated the Pearson correlation
between these profiles to investigate how similar they are. The results
are presented in Table 4. We clearly see that the profiles for expert
users are more stable than those for average users. This indicates
that users who write only about a certain topic, continue to do so.
The average time between the profiles was measured by taking the
first tweet of both profiles and calculating the time di erence. For
the tweets we crawled, this time varies between 100 and 200 days.
Over this timespan, the user profiles are very stable, so an update
of the users is not required very frequently.</p>
    </sec>
    <sec id="sec-16">
      <title>4.5 Discussion</title>
      <p>In this section, we evaluated di erent properties of the generated
user profiles. We showed that the topics that users talk about are
relatively broad on average. However, our models allow the
separation of domain experts who mainly focus on one topic or a set
of connected topics. Our last series of experiments showed that
the topics users write about are relatively stable, indicating that
frequent updates or recrawling of user data is not required.</p>
    </sec>
    <sec id="sec-17">
      <title>5. EXPERIMENTS FOR TOPIC MODELING</title>
    </sec>
    <sec id="sec-18">
      <title>AND EVENT DETECTION</title>
      <p>In this section, we will describe a series of experiments showing
how the expert detection methods can improve the e ectiveness of
event detection and trend modeling in the area of stock market
analysis. The main research questions addressed by these experiments
are:</p>
      <p>RQ#1. Which users talk about which companies? We can
distinguish users based on the calculated interests and
posting behavior in certain domains and industry sectors. Based
on this, we analyze how e ective a selection of certain users
for a specified company is. We also analyze how many of the
related messages for this company we can retrieve based on
the selection of a few expert users.</p>
      <p>RQ#2. Are small sets of expert users suitable for event
detection and prediction? Our dataset contains tweets related
to stocks, these stocks are related to events. By analyzing
the tweets from di erent user groups we can arrive at di
erent predictions. We analyze the timeframes in which expert
users talk about a certain stock and in which timeframe the
“normal” users mention this stock.</p>
    </sec>
    <sec id="sec-19">
      <title>5.1 Dataset</title>
      <p>For this series of experiments we use the same dataset as described
in Section 4. Additionally, we generated profiles for a set of 10
different companies corresponding to di erent sectors. These profiles
were generated based on the Wikipedia pages of the corresponding
companies and the out links of these pages. Two example profiles
for the companies “Ford” and “Merck” are shown in Figure 3. We
can see a strong focus in the areas of Automobiles and
Pharmacology, which are the main areas of interest for these two companies.
For further evaluation we selected 10,000 users out of the 333,000
users in our dataset. These users where randomly selected with
the constrains that the users had more than 100 tweets and more
than 100 followers, since we intend to focus on active users to
have a comparable baseline. For each of the chosen companies,
we collected the sets of 100 and 1000 most similar users based on
the Pearson correlation between the generated profiles of users and
companies. To address our first research question, we analyzed the
percentages of our user groups which had mentioned one of the
analyzed stock names within the monitored period. The results are
shown in Table 5. When comparing the top users or the very active
users with all users, it is evident that most of the content is posted
by the top users. Only highly popular companies like Amazon or
Apple got mentioned by a large fraction of average users. In two of
the analyzed cases the selected user groups did not match the users
talking about the stock well, so the percentage of users within the
top expert users was smaller than the percentage within the set of
active users. For the other companies the sets of top users contained
a higher percentage of users who talked about the company.
In order to assess how useful a selection of specific users can be in
the area of event detection, we looked up events for all companies
from Yahoo Finance3. For each company we choose the 3 events
with the strongest impact on the number of posted tweets per day.
All these were events which may have direct influence on the stock
market, such as the announcement of the quarterly reports. Table
6 shows the companies we used and the number of tweets found
within our original dataset.</p>
      <sec id="sec-19-1">
        <title>3http://finance.yahoo.com/</title>
        <p>Each of the chosen events shows a clear spike corresponding to
the number of tweets containing the stock symbol of the company.
Figure 4 shows one of our example events. We can see that all 3
groups of users show the same pattern. This indicates that all 3
groups of users could be used for detecting the related events.</p>
        <p>Table 7 shows the overall results of our event detection
experiments. Since the main focus of this paper is not the evaluation
of the event detection algorithm, we choose to use a simple
metric for evaluating how good event detection algorithms could work
on the di erent times series. We measured the ratio between the
number of tweets at the events and the average number of posts
per day. A ratio of 7 for example indicates that on the days of the
events, 7 times more tweets were posted as compared to a normal
day without a special event. We can see that based on this metric
the user groups chosen by expertise and topic outperform the active
users by far. This indicates that by using these groups of users event
detection on the generated time series becomes considerably easier,
since the events generate spikes which are higher above the average
and therefore easier to distinguish from the surrounding noise.</p>
      </sec>
    </sec>
    <sec id="sec-20">
      <title>5.2 Discussion</title>
      <p>We evaluated how useful the generated sets of users are in terms
of the amount of content we can collect per company and in terms
of event detection for the di erent companies. The experiments
showed that the selection of enough users is not trivial when small
companies are monitored. In these scenarios the selection of more
general users might be required. For most of the analyzed
companies we found sets of users which were large enough and allowed us
to analyze the tweets posted by these users. We evaluated how
useful these tweets are for an event detection scenario. For the event
detection, special groups of expert users showed the tendency to
post content more focused and related to events, which makes
detection of these events easier when only these users are monitored.</p>
    </sec>
    <sec id="sec-21">
      <title>6. CONCLUSION &amp; FUTURE WORK</title>
      <p>In this paper, we presented and evaluated a method for generating
user profiles for users from social networks in the domain of stock
market analysis. We evaluated the performance of the generated
models by analyzing how users who are selected based on their
user models, perform in di erent tasks compared to normal users
and very active users. Our findings clearly indicate that the use of
expert users based on our proposed approach entails the following
benefits.</p>
      <p>This approach allows us to attain high quality content related
to the domain or company of interest.</p>
      <p>We can attain a high quality of content while optimizing the
set of users.</p>
      <p>We showed that in the area of event detection, the use of a
relatively small set of expert users facilitates the detection of
relevant events, and can improve the event detection quality.
In our imminent future work we will delve further into the content
of the posts from di erent user groups. Especially in the area of
sentiment analysis, abridged with the correlation between the
social network content and the stock market data, further evaluations
which distinguish expert users from normal users are interesting.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>F.</given-names>
            <surname>Abel</surname>
          </string-name>
          , E. Herder, and
          <string-name>
            <given-names>D.</given-names>
            <surname>Krause</surname>
          </string-name>
          .
          <article-title>Extraction of professional interests from social web profiles</article-title>
          .
          <source>Proc. UMAP</source>
          ,
          <volume>34</volume>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>E.</given-names>
            <surname>Agichtein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Castillo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Donato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gionis</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Mishne. Finding</surname>
          </string-name>
          high
          <article-title>-quality content in social media</article-title>
          .
          <source>In Proceedings of the 2008 International Conference on Web Search and Data Mining</source>
          , pages
          <fpage>183</fpage>
          -
          <lpage>194</lpage>
          . ACM,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Bar-Haim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Dinur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Feldman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fresko</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Goldstein</surname>
          </string-name>
          .
          <article-title>Identifying and following expert investors in stock microblogs</article-title>
          .
          <source>In Proceedings of the Conference on Empirical Methods in Natural Language Processing</source>
          , pages
          <fpage>1310</fpage>
          -
          <lpage>1319</lpage>
          . Association for Computational Linguistics,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Bodnar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tucker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hopkinson</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S. G.</given-names>
            <surname>Bilen.</surname>
          </string-name>
          <article-title>Increasing the veracity of event detection on social media networks through user trust modeling</article-title>
          .
          <source>In Big Data (Big Data)</source>
          ,
          <source>2014 IEEE International Conference on</source>
          , pages
          <fpage>636</fpage>
          -
          <lpage>643</lpage>
          . IEEE,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Chelaru</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Herder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. D.</given-names>
            <surname>Naini</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Siehndel</surname>
          </string-name>
          .
          <article-title>Recognizing skill networks and their specific communication and connection practices</article-title>
          .
          <source>In Proceedings of the 25th ACM conference on Hypertext and social media</source>
          , pages
          <fpage>13</fpage>
          -
          <lpage>23</lpage>
          . ACM,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Cilibrasi and P. M. B. Vitányi</surname>
          </string-name>
          .
          <article-title>The google similarity distance</article-title>
          .
          <source>IEEE Trans. Knowl</source>
          . Data Eng.,
          <volume>19</volume>
          (
          <issue>3</issue>
          ):
          <fpage>370</fpage>
          -
          <lpage>383</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>E.</given-names>
            <surname>Diaz-Aviles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Stewart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Velasco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Denecke</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W.</given-names>
            <surname>Nejdl</surname>
          </string-name>
          .
          <article-title>Epidemic intelligence for the crowd, by the crowd</article-title>
          .
          <source>In ICWSM</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Benevenuto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ganguly</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Gummadi</surname>
          </string-name>
          .
          <article-title>Cognos: crowdsourcing search for topic experts in microblogs</article-title>
          .
          <source>In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval</source>
          , pages
          <fpage>575</fpage>
          -
          <lpage>590</lpage>
          . ACM,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>I.</given-names>
            <surname>Guy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Avraham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Carmel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jacovi</surname>
          </string-name>
          ,
          <string-name>
            <surname>and I. Ronen.</surname>
          </string-name>
          <article-title>Mining expertise and interests from social media</article-title>
          .
          <source>In Proceedings of the 22nd international conference on World Wide Web</source>
          , pages
          <fpage>515</fpage>
          -
          <lpage>526</lpage>
          . International World Wide Web Conferences Steering Committee,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R.</given-names>
            <surname>Kawase</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Siehndel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. P.</given-names>
            <surname>Nunes</surname>
          </string-name>
          , E. Herder, and
          <string-name>
            <given-names>W.</given-names>
            <surname>Nejdl</surname>
          </string-name>
          .
          <article-title>Exploiting the wisdom of the crowds for characterizing and connecting heterogeneous resources</article-title>
          .
          <source>In HT</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>H.</given-names>
            <surname>Kwak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Park</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Moon</surname>
          </string-name>
          .
          <article-title>What is twitter, a social network or a news media</article-title>
          ?
          <source>In Proceedings of the 19th international conference on World wide web</source>
          , pages
          <fpage>591</fpage>
          -
          <lpage>600</lpage>
          . ACM,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Lenhart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Purcell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Smith</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Zickuhr</surname>
          </string-name>
          .
          <article-title>Social media &amp; mobile internet use among teens and young adults</article-title>
          .
          <source>millennials. Pew Internet &amp; American Life Project</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>D.</given-names>
            <surname>Milne</surname>
          </string-name>
          and
          <string-name>
            <given-names>I. H.</given-names>
            <surname>Witten</surname>
          </string-name>
          .
          <article-title>Learning to link with wikipedia</article-title>
          .
          <source>In Proceedings of the 17th ACM conference on Information and knowledge management</source>
          , pages
          <fpage>509</fpage>
          -
          <lpage>518</lpage>
          . ACM,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>D. N.</given-names>
            <surname>Milne</surname>
          </string-name>
          and
          <string-name>
            <given-names>I. H.</given-names>
            <surname>Witten</surname>
          </string-name>
          .
          <article-title>An open-source toolkit for mining wikipedia</article-title>
          .
          <source>Artif</source>
          . Intell.,
          <volume>194</volume>
          :
          <fpage>222</fpage>
          -
          <lpage>239</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>T.</given-names>
            <surname>Sakaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Okazaki</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Matsuo</surname>
          </string-name>
          .
          <article-title>Earthquake shakes twitter users: real-time event detection by social sensors</article-title>
          .
          <source>In Proceedings of the 19th international conference on World wide web</source>
          , pages
          <fpage>851</fpage>
          -
          <lpage>860</lpage>
          . ACM,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>P.</given-names>
            <surname>Siehndel</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Kawase</surname>
          </string-name>
          . Twikime!
          <article-title>- user profiles that make sense</article-title>
          .
          <source>In International Semantic Web Conference (Posters &amp; Demos)</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>R.</given-names>
            <surname>Yeniterzi</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Callan</surname>
          </string-name>
          .
          <article-title>Constructing e ective and e cient topic-specific authority networks for expert finding in social media</article-title>
          .
          <source>In Proceedings of the first international workshop on Social media retrieval and analysis</source>
          , pages
          <fpage>45</fpage>
          -
          <lpage>50</lpage>
          . ACM,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>D.</given-names>
            <surname>Yimam-Seid</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Kobsa</surname>
          </string-name>
          .
          <article-title>Expert-finding systems for organizations: Problem and domain analysis and the demoir approach</article-title>
          .
          <source>Journal of Organizational Computing and Electronic Commerce</source>
          ,
          <volume>13</volume>
          (
          <issue>1</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>24</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Ackerman</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Adamic</surname>
          </string-name>
          .
          <article-title>Expertise networks in online communities: structure and algorithms</article-title>
          .
          <source>In Proceedings of the 16th international conference on World Wide Web</source>
          , pages
          <fpage>221</fpage>
          -
          <lpage>230</lpage>
          . ACM,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>