<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Discover ing the Dyna mics of Ter ms' Sema ntic Rela tedness thr ough Twitter</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nikola Milikic</string-name>
          <email>nikola.milikic@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jelena Jovanovic</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Milan Stankovic</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>STIH, Université Paris-Sorbonne</institution>
          ,
          <addr-line>28 rue Serpente, 75006 Paris</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Belgrade</institution>
          ,
          <addr-line>Jove Ilica 154, 11000 Belgrade</addr-line>
          ,
          <country country="RS">Serbia</country>
        </aff>
      </contrib-group>
      <fpage>57</fpage>
      <lpage>68</lpage>
      <abstract>
        <p>Abstr act. Determining the semantic relatedness (SR) of two terms has been an appealing topic in information retrieval for many years as such information is useful for various tasks ranging from tag recommendation, over search query refinement to suggesting new web resources for the user to discover. Most approaches consider the SR of terms as static over time, and disregard the eventual temporal changes as imperfections. However, detecting and tracing changes in SR of terms over time may help in understanding the nature of changes in public opinion, as well as the change in the usage of terms in common language and jargon. In this paper, we propose an approach that makes use of microposts data in order to establish a dynamic measure of SR of terms, i.e., a measure that accounts for the changes in SR over time. We propose different scenarios of use (in online advertising and organizational knowledge management) which demonstrate the applicability of our approach in real life situations. We also provide a demo application for visualizing the change in micropost-based SR of terms.</p>
      </abstract>
      <kwd-group>
        <kwd>Keywor ds</kwd>
        <kwd>Semantic relatedness</kwd>
        <kwd>dynamic measure of semantic relatedness</kwd>
        <kwd>microposts</kwd>
        <kwd>Twitter</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Many research papers (such as in Wagner [1]) claim that Twitter and similar
microblogging services have become a valuable source of knowledge, and have tried to
extract this knowledge and use it for various purposes, such as the creation of dynamic
domain models suitable for semantic analysis and annotation of real-time data [2],
modeling of users’ interests and finding experts [3], etc. However, from our point of
view, the real-time nature of Twitter and Twitter-like services has not really been
explored to its full extent, yet.</p>
      <p>Most approaches exploit the mass of data that users generate on real-time services as
their most valuable feature. We believe that there is a significant value in the fact that
tweets (and microposts in general), posted frequently and massively, represent the
moment in which they are created and the characteristics of that moment. Therefore, we
have been exploring how these real-time services can support the detection of changes
in semantics of terms, by enabling one to observe the changes of a term’s use over time.
We focus particularly on the semantic relatedness (SR) of terms which is also subject to
temporal changes.</p>
      <p>The scope of meaning of a certain term is always defined in a social circle in which
that meaning emerges, is agreed upon and accepted. Knowing that social systems are
dynamic, it is difficult to neglect the natural changes (i.e., evolution) in socially agreed
upon meaning of terms. If the meaning of a term is changing over time, so is the
relatedness of that term to other terms whose dynamics in the given time period might
be different. The most basic illustration of this is the term totalitarian regime. It is
reasonably close to the terms identifying particular totalitarian governments and
dictators of particular countries. However, this proximity should decrease if the
totalitarian regime in a country is replaced by a democratic government – which
happens more and more often in recent times.</p>
      <p>Although tendencies in the public expressions can easily be detected through search
query frequencies and trending topics on Twitter, the nature of tendencies and their
mutual relationships are not directly evident from such observations. We could imagine
having three trending topics on Twitter: Egypt, revolution, and Britney Spears.
Although a human may grasp that it is more likely that the revolution is happening in
Egypt and not that Britney Spears is leading a revolution, for a computer, this is far less
obvious. The change of SR, however, could indicate the rationale for the raising public
interest in a particular term. For instance, we could see that the recent popularity of the
term Egypt might have been related to the temporary increase of SR of terms Egypt and
revolution; and that it had nothing to do with the raise of popularity of the term Britney
Spears. Spotting the change in SR of terms could thus help to give meaning to the
observed trends in Web content, and enable machines to grasp this meaning and take
advantage of it in many real life scenarios.</p>
      <p>In this paper we present our initial research on using real-time services, in general
and Twitter in particular, to detect the changes in SR of terms. We also explore the
scenarios where reacting to those changes might be beneficial. In Section 2, we present
the state of the art in research on SR of terms as well as in using Twitter to detect
tendencies and make use of them. Section 3 introduces our measure of SR based on
micropost data – Normalized Micropost Distance, whereas Section 4 gives some
suggestions on how the relevancy of the change in SR of terms could be detected. We
present our application for testing the proposed approach in Section 5, and consider the
potential usage scenarios in Section 6. In Section 7, we give some interesting examples
of changes in terms’ SR which we have observed by using our application. Section 8
concludes the paper with propositions of future work that will help give maturity to our
initial research.</p>
    </sec>
    <sec id="sec-2">
      <title>2 State of The Ar t</title>
      <p>The problem of determining semantic relatedness of terms has been studied for decades,
in various contexts and using different approaches. Semantically related terms have
been used to help users choose the right tags in collaborative filtering systems [4]; to
#MSM2011
discover alternative search queries [5]; for query refinement [6]; to enhance expert
finding results [7]; for ontology maintenance [8] [9], and in many other scenarios.</p>
      <p>
        Different techniques and different sources have been used and combined to develop
measures of semantic relatedness (MSRs). These measures could be split into three
major categories: 1) net-based measures, 2) distributional measures and 3)
Wikipediabased measures [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. In what follows we briefly examine each category of MSRs.
      </p>
      <p>
        Net-based measures make use of semantic (e.g., hyponymy or meronymy) and/or
lexical (e.g., synonyms) relationships within a network (graph) of concepts to determine
semantic proximity between the concepts. For example, Burton-Jones et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] exploit
the hypernym graphs of WordNet1; Safar et al. [6] use Gallois lattice to provide
recommendations based on domain ontologies, whereas Ziegler et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and Resnik
[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] use the ODP taxonomy2. This category also includes measures that rely on the
graph structure of concepts to determine semantic relatedness of those concepts.
Shortest path is among the most common of such measures. It is often enhanced by
taking into account the informational content of the nodes in the graph [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>
        Distributional measures rely on the distributional properties of words in large text
corpora. Such MSRs deduce semantic relatedness by leveraging co-occurrences of
concepts. For example, the approach presented in Salton et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] uses co-occurrence
in text of research papers, pondered with a function derived from the tf-idf measure to
establish a notion of word proximity. Co-occurrence in tags [4] and in search results
[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] is also commonly used. In Strube et al. [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], the authors introduced Normalized
Web Distance (NWD) as a generalization of Normalized Google Distance (NGD) MSR
and investigated its performance with six different search engines. The evaluation
(based on the correlation with human judgment) demonstrated the best performance of
Exalead-based NWD measure, closely followed by Yahoo!, AltaVista, Ask and Google;
only Live Search and Clusty showed significantly lower results.
      </p>
      <p>
        As its name suggests, the third category of MSRs – Wikipedia-based measures –
makes use of Wikipedia as the resource for computing semantic relatedness and often
combines the features of the previous two MSR groups. For example, [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] relies on the
graph of Wikipedia categories, whereas Waltinger et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] rely on co-occurrence of
words in the text of Wikipedia pages, combined with the information about the
categories of pages in Wikipedia to compute semantic relatedness.
      </p>
      <p>
        In Waltinger et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], the authors report on a comparative analysis of a large
number of MSRs (at least 4 algorithms from each major category of MSRs were
included in the study, resulting in sixteen algorithms in total). The most important
results could be summarized as follows: 1) small, hand-crafted and structured resources
(e.g., WordNet) are inferior to large and semi-structured (i.e., Wikipedia) or even
unstructured resources (i.e., plain text); 2) the distributional MSRs (especially measures
like Latent Semantic Analysis) perform significantly better than the net-based measures
and those using explicit categorical information; 3) MSRs that use the Web as a corpus
were inferior to those operating on smaller but better controlled training corpora (e.g.,
Normalized Distance based on Wikipedia significantly outperformed NGD).
      </p>
      <p>
        Most of the existing approaches do not take into account the dynamic nature of
semantic relatedness between terms. An exception would be the work presented in
1 http://wordnet.princeton.edu/
2 http://www.dmoz.org/
#MSM2011
Nagarajan et al. [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] where authors take the approach of identifying ‘strong descriptors’
of an event by querying Google Insights to get the terms the event’s name was queried
with the most (referred as ‘seed keywords’). Afterwards, they query Twitter to get the
tweets containing seed keywords and extract the strong descriptors from them.
However, this approach does not measure SR between two specific terms, but rather
identify terms relevant to the name of an event being examined. Other approaches even
take the stability of their measure over time, to demonstrate the solidity of their
approach [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
      </p>
      <p>
        On the other hand many approaches exist for extracting meaning from Twitter
[
        <xref ref-type="bibr" rid="ref20">20</xref>
        ][1]. Some of them make extensive use of Twitter dynamics, like the approach for
detecting events through peaks of word popularity [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. Most related to our work is the
approach presented in Song et al. [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] which relies on spatio-temporal characteristics of
topics mined from Twitter data, for the calculation of semantic relatedness among
topics. The temporal aspect of a topic is determined by the frequency of its occurrence
in Twitter data streams over a given time period, whereas the spatial aspect refers to the
regional distribution of messages mention the given topic over the same time period.
Although this approach looks promising, its usefulness for measuring SR of topics has
not been fully proved yet.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Nor malized Micr opost Distance</title>
      <p>
        Inspired by the work of Cilibrasi et al. [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] on establishing Normalized Google Distance
(NGD) as a MSR of terms based on Google search result, we propose a similar measure
– Normalized Micropost Distance (NMD) – based on the results of searching the
content (i.e., microposts) of real-time (Twitter-like) services. By leveraging micropost
streams of real-time services, this measure should reflect the change in terms’ SR more
quickly than the standard web search results that are not updated in real-time. The basic
assumption behind our approach is that Google’s Search API results tend to be stable
and based on content with a lower frequency of change, and as such would not be as
good in indicating the changes in the SR of terms as could be search results that are
based on real-time content..
      </p>
      <p>NGD uses the frequencies of appearance of two terms in the Google index, as well as
the frequency of their mutual appearance to quantify the extent to which the two terms
are related. The basic assumption behind this measure is that terms that co-occur more
frequently would be more related. Similarly, the proposed NMD measure can be
calculated using the formula (1).</p>
      <p>NMD(x,y )t =
max{log f (x )t ,log f (y )t } − log f (x,y )t
log M − min{log f (x )t ,log f (y )t }
(1)</p>
      <p>The formula allows one to calculate the NMD of two terms x and y for the time
interval t. f(x)t and f(y)t represent the number of results returned for the term x and y,
respectively, within the time interval t, when searching the content (i.e., microposts) of a
real-time, Twitter-like service. The terms in the formula (x and y) may also be
compound terms. Calculating the value of this formula for the same terms over different
#MSM2011
time intervals is essential for determining the dynamics of their relationship, as we
further explain in the following two sections.</p>
    </sec>
    <sec id="sec-4">
      <title>4 Detecting the Significance of Change</title>
      <p>The notion of NMD defined above is useful for measuring the difference in SR of two
terms, but will not, by itself help to detect changes worthy of notice, and distinguish
them from small and frequent variations. We suggest two complementary ways to
perform this detection.</p>
      <p>First, calculating the standard deviation of NMDs over a longer period of time would
give a good ground to judging the significance of the identified changes. Standard
deviation of NMDs can be calculated using the formula (2). The given formula
represents the standard deviation of NMDs over a sample of N observations in which
NMDs were calculated in time intervals i that are of the same length.</p>
      <p>N
∑(NMD(x,y )i − avg (NMD(x,y )))2
σ (NMD(x,y )) =
i =1</p>
      <p>N
(2)</p>
      <p>Detection of a change in terms’ SR (measured using NMD) that is greater than the
standard deviation σ could be an indicator of a significant change.</p>
      <p>In addition to this indicator, one could observe the stability of change over several
consecutive time instances to make sure that the change is not of a too short breath.
However such a criterion may not be generally applicable and is specific to each use
case, as even short changes might matter in some use cases, while in others only a
change that spans several days would be significant.</p>
    </sec>
    <sec id="sec-5">
      <title>5 Demo Application</title>
      <p>In order to test the proposed approach of using micropost streams to calculate SR of
terms, we have developed a simple web application that makes use of Twitter Search
API3 for computing NMD. The application, entitled Tweet Dynamics, currently in
private beta, demonstrates how the NMD measure can be utilized, visualized and
interpreted. Application is built in Java programming language using Tapestry Web
Framework4. Javascript plotting library for jQuery named Flot5 is used for plotting the
result diagram.</p>
      <p>The application’s home page presents a user with a simple interface (Figure 1) which
allows her to input the number of days and two keywords that NMDs should be
calculated for. By clicking on the button ‘Calculate’, NMD calculation process is
3 http://search.twitter.com/api/
4 http://tapestry.apache.org/
5 http://code.google.com/p/flot/
#MSM2011
invoked. Application then queries the Twitter API to get all posts containing the first
keyword, then posts containing the second keyword, and at the end to get all the posts
containing both keyword. This process is repeated for the given number of days. With
that data, NMDs are being calculated according to the formula (1).</p>
      <p>The result of calculation is shown in a diagram (Figure 2) where each day is
presented as a dot on the diagram line. One can easily perceive a trend of SR between
two keywords during the past days.</p>
      <p>Although, for the purpose of calculating standard deviation, our application keeps the
computed values of NMD, the value of standard deviation is not shown on Figure 2
since we do not yet have a significant sample of values (e.g., dating from at least a
month ago) and thus taking into account the currently available value of standard
deviation would not be methodologically sound. Once a significant sample is present,
the user would see a second line representing the standard deviation, so he/she could
spot when the change in NMD becomes significant.</p>
      <p>Although some Web actors have access to the total history of tweets, most of
interested parties have quite limited access to the Twitter Search API, which allows up
to 1500 results per query. For terms of high frequency this can be a limiting factor since
it makes it impossible to estimate their full frequency, and compare it with other
highfrequency terms. A workaround that we use is to sample the tweets within short time
intervals in which the number of tweets per terms is lower than the imposed limit. This
however involves the risk of hitting the limit of 150 requests to the Twitter API per
hour.</p>
      <p>Another limitation of using Twitter’s Search API is the restriction on the temporal
range of tweets that can be returned as a search result. In particular, according to the
API’s documentation,6 a post returned as a search result must not be ‘too old’, which in
practice brings down to a number of six days, meaning that the oldest post returned as a
result of a query is six days old. This restriction highly limits the ability to test our
application on the microposts generated during a longer time span and detect trends in
SR between keywords related to certain events or periods of year. If we had access to
data spanning a longer period of time, we would have been able to test our results by
6 http://apiwiki.twitter.com/w/page/22554756/Twitter-Search-API-Method:-search
#MSM2011
comparing them with various indicators such as survey results, sales changes of a
product etc.</p>
    </sec>
    <sec id="sec-6">
      <title>6 Scenar ios of Use</title>
      <p>In this section we present two usage scenarios aiming to illustrate the potential benefits
of the suggested dynamic MSR in real life settings. The first scenario assumes the usage
of Twitter content stream for the calculation of NMD, whereas the second one relies on
the micropost exchanged in a (internal) micro-blogging tool of an organization.
Scenar io 1: Adapting Online Adver tising Campaigns to the Changes in Ter m
Relatedness
Optimization of the keyword choice for online advertising campaigns has become a
vivid market with more and more players in the field. Using the information about
keywords similarity and relatedness, combined with prices of keywords in advertising
services, such as AdWords, it is possible to find a combination of keywords that costs
less, but drives the same or bigger amount of relevant traffic. Such services, however,
do not take advantage of keywords that become occasionally relevant. For instance, let
us consider the situation happened at this year’s SXSW7 conference held at Austin,
Texas, USA. Many new iPad applications were showcasted at the conference and a
rumor appeared, and lately became truth, that iPad 2 would start selling on the second
day of the conference. This trend would be noticed if NMD was measured for the words
‘ipad’ and ‘sxsw’ A company selling iPad accessories, would in such an occasion have
a clear interest to alter the keywords for their AdWords campaign for promoting its
products and add the word ‘sxsw’, thus getting new relevant traffic. Once the NMD for
the two words goes up again, the advertising campaign can again be changed to avoid
driving the traffic that became less relevant.</p>
      <p>Responding to changes in terms’ relatedness over time, for advertising campaigns
means not missing out relevant traffic, and as such is of high importance for this market.
Web marketing tools such as KeywordDiscovery.com do offer the possibility to
discover relevant keywords and include them in marketing campaigns, but do not reflect
the change in this relevancy. Changes in relevancy might open completely new
possibilities for advertising campaign optimization, and using our notion of NMD, these
changes may even be taken into account in an automated or semi-automated way.
Scenar io 2: Facilitating Discover y of Relevant Resour ces in Or ganizations
Many organizations, especially larger ones, maintain organizational vocabularies and
use them for the annotation of different kinds of documents and other digital assets.
Such a vocabulary often results from a collaborative work of domain experts and a
knowledge engineer. Therefore, it tends to reflect the experts’ view of the subject
7 http://sxsw.com/
#MSM2011
domain, and the terms it defines reflect the jargon used by these experts. However, this
jargon does not necessarily overlap with the everyday language used by the employees
within the organization. As a consequence, employees would experience difficulties in
formulating their requests for different kinds of organizational resources using the
organization’s official vocabulary. This indicates the need for harmonizing the official
and the actual vocabularies within an organization. Furthermore, each organization
evolves and many organizations need to go through continuous changes in order to
respond to the constantly changing conditions in their environment. To properly address
the evolving work practices in the organization, the organization’s vocabulary has to
evolve as well, and it should evolve to be comprehensible and usable by the employees
(i.e., it should incorporate the terminology used by the employees). This is where the
suggested dynamic MSR applied over the messages exchanged in the organization’s
Twitter-like communication channels (e.g., Yammer8) can help. In particular, the
proposed MSR can be used for extracting terms related to certain tasks, projects,
organizational positions, etc., in order to use them for evolving the organization’s
vocabulary. This would increase the usability of the vocabulary and consequently
improve the search and discovery of organizational resources.</p>
      <p>
        The suggested dynamic MSR can also be applied for facilitating people search within
an organization by enabling the deduction of terms that best describe each employee.
Previous studies exploring the practice of people tagging in organizations [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ][
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] have
confirmed that people do perceive such a practice beneficial as it allows for, e.g.,
finding out who is working on a certain project/task, or identifying experts in a
particular topic. However, the main obstacle for applying this practice in workplace lies
in the very act of directly tagging (labeling) a person; many participants in the cited
studies were reluctant to directly tag their colleagues as they were worried about
potentially inadvertent effects those tags might cause. With the proposed dynamic MSR
applied to the messages exchanged within the organization’s micro-blogging and/or
social streaming application, an organization would be able to identify the terms (tags)
related to each employee. These terms would still reflect the community’s perception of
any particular employee, while freeing people from the unnecessary cognitive burden of
inadvertently affecting their colleagues.
      </p>
    </sec>
    <sec id="sec-7">
      <title>7 Example Diagr ams</title>
      <p>In order to test the use of the formula (1) on the data gathered from Twitter in several
consecutive days for detection of the change in SR between two terms, we chose several
examples of term pairs whose popularity we, as humans, were able to perceive from the
news. The testing was done using our Tweet Dynamics application (cf. Section 5).</p>
      <p>Since, unfortunately, catastrophic events were happening in Japan at the time of
writing this paper9, we used keywords ‘japan’ and ‘nuclear’ and calculated their NMDs
for 5 days starting from March 8, 2011.
8 https://www.yammer.com/
9 On March 11, 2011, a strong earthquake struck Japan which triggered a failure of the cooling
system of the reactor at Japan’s Fukushima nuclear power plant, causing a huge explosion at
the power plant the day after, on March 12.
#MSM2011</p>
      <p>By looking at the diagram (Figure 3), one can observe that by March 11, there was a
small relatedness between the terms ‘japan’ and ‘nuclear’ because the earthquake
happened suddenly; thus the value of NMD (shown on Y axis) is higher. On the day of
the earthquake (March 11th), one can see that the NMD significantly decreased, i.e., SR
of the terms increased, as many people tweeted about the danger of explosion at the
nuclear power plant. That trend continued in the following days.</p>
      <p>Our second example is about terms ‘ipad’ and ‘sxsw’ (already mentioned in Section
6). iPad started selling unexpectedly during the SXSW on the 12th of March. From the
diagram, it is obvious that there was a rumor about it some days ago, as the NMD
decreased exponentially towards the first day of sales, to reach its lowest value on the
12th of March. It is easy to think of potential benefits that the owners of iPad-related
content might derive from this newly related term, by including it in their advertising
campaigns, and using it for positioning their content. The relevant NMD diagram is
shown on Figure 4.</p>
      <p>As already mentioned, there is a big limitation of using Twitter Search API, because
it limits the number of search results to a maximum of 1500. If we had access to the
whole corpus of messages posted in this period, we would have been able to measure
the change in relatedness more precisely. However, in the case of terms that are usually
rather non-related, the importance of the change is still noticeable even with such
limitations imposed.</p>
    </sec>
    <sec id="sec-8">
      <title>8 Conclusions and Futur e Work</title>
      <p>
        This paper presents our initial work on using data streams from Twitter or Twitter-like
services for the detection of changes in semantic relatedness of terms. In particular,
being inspired by the work of Cilibrasi &amp; Vitanyi [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] on using Google search results
for computing semantic relatedness of terms, we have introduced Normalized Micropost
Distance (NMD). It makes use of micropost streams of Twitter-like services to compute
semantic relatedness of two terms for a given time period. We have also suggested how
our approach can be leveraged in two real-life scenarios that differ both in the
application domain (online advertising and organizational knowledge management) and
the data source to be used for the computation of the NMD measure (Twitter and
organization’s internal micro-blogging service).
      </p>
      <p>An important challenge to attack in our future work is the detection of good
candidate term pairs, i.e., pairs where a change is likely to happen. Our NMD measure
allows one to measure the change in semantic relatedness, and follow it over time, but
does not directly help in identifying which term pairs are likely to be the subject of
change without calculating the NMD values for all possible term pairs. Having such a
possibility is important in light of the need for computational efficiency and of the limits
imposed by Twitter and other major players on Real-time Web. The detection of
candidates for NMD calculation is dependent of the actual usage scenario, as each
reallife scenario is related to a specific subject domain characterized by its specific language
and important topics. Accordingly, for each scenario, there would be a list of terms to
watch. With such a list available, it would be enough to identify the candidate terms
that, when coupled with the watched terms could form pairs for which the calculation of
NMD might lead to the detection of significant relatedness. We believe that looking at
trending topics on Twitter, as well as in recent news articles, might help in finding good
candidate terms for a Web marketing scenario (as presented in Section 6). Our intention
is thus to explore this research question and deliver a system that could take a number of
terms to watch, and provide a list of terms that have recently become more related to
one or more of the watched terms.
#MSM2011</p>
      <p>
        Another equally important direction of our future work is a comprehensive
evaluation of the proposed dynamic measure of semantic relatedness of terms. For that
purpose we intend to use Twitter’s Streaming API10, and in particular its “Gardenhose”
access level which offers the proportion of the Twitter’s public data stream (currently,
around 10%) that could form a statistically significant sample. This approach would
help us overcome the mentioned limitations of using Twitter Search API. Besides that,
since Google recently started including real–time updates coming from Twitter, it could
serve us as an important source of data. But since, at the time of writing this paper, these
data were not accessible through Google Search API, we need to wait for this feature to
become programmatically available. Using this data stream, we intend to do an
evaluation study that would consist of a comparative analysis of our approach and the
approach we found as the most related to our work, namely the approach reported in
Song et al. [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ].
      </p>
      <p>Refer en ces
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]</p>
      <p>Wagner, C. (2010). Exploring the Wisdom of the Tweets: Towards Knowledge Acquisition from
Social Awareness Streams. PhD Symposium at 7th Extended Semantic Web Conference
(ESWC2010) Heraklion, Crete, Grece: Springer. Retrieved March 10, 2011, from
http://www.springerlink.com/index/R4463T1333777N11.pdf.
#MSM2011
from</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Sheth</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thomas</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Mehra</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Continuous Semantics to Analyze Real-Time Data</article-title>
          .
          <source>IEEE Internet Computing 14, 6 (November</source>
          <year>2010</year>
          ),
          <fpage>84</fpage>
          -
          <lpage>89</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Stankovic</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rowe</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Laublet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          (
          <year>2010</year>
          ). Mapping Tweets to Conference Talks:
          <article-title>A Goldmine for Semantics</article-title>
          .
          <source>in Proceedings of the 3rd Social Data on the Web Conferece, SDOW2010, collcoated with International Semantic Web Conference ISWC2010</source>
          . Shanghai, China.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Sigurbjörnsson</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Zwol</surname>
            ,
            <given-names>R. van.</given-names>
          </string-name>
          (
          <year>2008</year>
          ).
          <article-title>Flickr tag recommendation based on collective knowledge</article-title>
          .
          <source>Proceeding of the 17th international conference on World Wide Web - WWW '08</source>
          ,
          <fpage>327</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          New York, New York, USA: ACM Press. doi:
          <volume>10</volume>
          .1145/1367497.1367542.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Mei</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Church</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          (
          <year>2008</year>
          ).
          <article-title>Query suggestion using hitting time</article-title>
          .
          <source>Proceeding of the 17th ACM conference on Information and knowledge mining - CIKM '08</source>
          ,
          <fpage>469</fpage>
          . New York, New York, USA: ACM Press. doi:
          <volume>10</volume>
          .1145/1458082.1458145.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Safar</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Kefi</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          (
          <year>2004</year>
          ).
          <article-title>OntoRefiner, a user query refinement interface usable for Semantic Web Portals</article-title>
          .
          <source>Proceedings of Application of Semantic Web</source>
          technologies to Web Communities, Workshop ECAI'
          <volume>04</volume>
          (pp.
          <fpage>65</fpage>
          -
          <lpage>79</lpage>
          ).
          <source>Retrieved January 25</source>
          ,
          <year>2011</year>
          , from http://scholar.google.com/scholar?hl
          <article-title>=en&amp;btnG=Search&amp;q=intitle:OntoRefiner,+a+user+query+refine ment+interface+usable+for+Semantic+Web+Portals#0.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Macdonald</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Ounis</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          (
          <year>2007</year>
          ).
          <article-title>Expertise drift and query expansion in expert search</article-title>
          .
          <source>Proceedings of the sixteenth ACM conference on Conference on information and knowledge management - CIKM '07</source>
          ,
          <fpage>341</fpage>
          . New York, New York, USA: ACM Press. doi:
          <volume>10</volume>
          .1145/1321440.1321490.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Cross</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          , “
          <article-title>Semantic Relatedness Measures in Ontologies Using Information Content and Fuzzy Set Theory,”</article-title>
          <source>In Proc. of the 14th IEEE Int'l Conf. on Fuzzy Systems</source>
          , (
          <year>2005</year>
          ), pp.
          <fpage>114</fpage>
          -
          <lpage>119</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Gasevic</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zouaq</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Torniai</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jovanovic</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hatala</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <article-title>"An Approach to Folksonomy-based Ontology Maintenance for Learning Environments,"</article-title>
          <source>IEEE Transactions on Learning Technologies</source>
          ,
          <year>2011</year>
          (in press)
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Waltinger</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cramer</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Wandmacher</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          (
          <year>2009</year>
          ).
          <article-title>From Social Networks To Distributional Properties: A Comparative Study On Computing Semantic Relatedness</article-title>
          . Cognitive Science.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Burton-Jones</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Storey</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sugumaran</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Purao</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2003</year>
          ).
          <article-title>A heuristic-based methodology for semantic augmentation of user queries on the web</article-title>
          .
          <source>Conceptual Modeling-ER</source>
          <year>2003</year>
          ,
          <volume>476</volume>
          -
          <fpage>489</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Ziegler</surname>
            ,
            <given-names>C.-N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Simon</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Lausen</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          (
          <year>2006</year>
          ).
          <article-title>Automatic Computation of Semantic Proximity Using Taxonomic Knowledge Categories and Subject Descriptors</article-title>
          .
          <source>CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management</source>
          (pp.
          <fpage>465</fpage>
          -
          <lpage>474</lpage>
          ). Arlington, Virginia, USA: ACM New York, NY, USA. Maguitman,
          <string-name>
            <given-names>A. G.</given-names>
            ,
            <surname>Menczer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Roinestad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            , &amp;
            <surname>Vespignani</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          (
          <year>2005</year>
          ).
          <article-title>Algorithmic detection of semantic similarity</article-title>
          .
          <source>Proceedings of the 14th international conference on World Wide Web</source>
          (p.
          <fpage>107</fpage>
          -
          <lpage>116</lpage>
          ).
          <source>ACM. Retrieved January 25</source>
          ,
          <year>2011</year>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Resnik</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          (
          <year>1995</year>
          ).
          <article-title>Using Information Content to Evaluate Semantic Similarity in a Taxonomy. Arxiv preprint cmp-lg/9511007, 1</article-title>
          . Retrieved January 24,
          <year>2011</year>
          , from http://arxiv.org/abs/cmp-lg/
          <fpage>9511007</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Matos</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arrais</surname>
            ,
            <given-names>J. P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maia-Rodrigues</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Oliveira</surname>
            ,
            <given-names>J. L.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Concept-based query expansion for retrieving gene related publications from MEDLINE</article-title>
          .
          <source>BMC bioinformatics</source>
          ,
          <volume>11</volume>
          , 212. doi:
          <volume>10</volume>
          .1186/
          <fpage>1471</fpage>
          -2105-11-212.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Salton</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          and
          <string-name>
            <surname>McGill</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>Introduction to Modern Information Retrieval</article-title>
          . New York:
          <string-name>
            <surname>McGraw-Hill</surname>
          </string-name>
          ,
          <year>1983</year>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Cilibrasi</surname>
            ,
            <given-names>R. L.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Vitanyi</surname>
            ,
            <given-names>P. M. B.</given-names>
          </string-name>
          (
          <year>2007</year>
          ).
          <article-title>The Google Similarity Distance</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          ,
          <volume>19</volume>
          (
          <issue>3</issue>
          ),
          <fpage>370</fpage>
          -
          <lpage>383</lpage>
          . doi:
          <volume>10</volume>
          .1109/TKDE.
          <year>2007</year>
          .
          <volume>48</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Gracia</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Mena</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          (
          <year>2008</year>
          ).
          <article-title>Web-Based Measure of Semantic Relatedness</article-title>
          .
          <source>In Proceedings of the 9th international conference on Web Information Systems Engineering (WISE '08)</source>
          , pp.
          <fpage>136</fpage>
          -
          <lpage>150</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Strube</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Ponzetto</surname>
            ,
            <given-names>S. P.</given-names>
          </string-name>
          (
          <year>2006</year>
          ).
          <article-title>WikiRelate! Computing semantic relatedness using Wikipedia</article-title>
          .
          <source>Proceedings of the National Conference on Artificial Intelligence</source>
          (Vol.
          <volume>21</volume>
          , p.
          <fpage>1419</fpage>
          ). Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press;
          <year>1999</year>
          . Retrieved February 22,
          <year>2011</year>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Mendes</surname>
            ,
            <given-names>P. N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Passant</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Kapanipathi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          (n.d.). Twarql:
          <article-title>Tapping into the Wisdom of the Crowd</article-title>
          .
          <source>Proceedings of the 6th International Conference on Semantic Systems</source>
          (p.
          <fpage>1</fpage>
          -
          <lpage>3</lpage>
          ). Graz,
          <source>Austria: ACM. Retrieved March</source>
          <volume>14</volume>
          ,
          <year>2011</year>
          , from http://portal.acm.org/citation.cfm?id=
          <fpage>1839762</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Sakaki</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Okazaki</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Matsuo</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Earthquake shakes Twitter users: real-time event detection by social sensors</article-title>
          .
          <source>Proceedings of the 19th international conference on World wide web</source>
          (p.
          <fpage>851</fpage>
          -
          <lpage>860</lpage>
          ).
          <source>ACM. Retrieved March 6</source>
          ,
          <fpage>2011</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Song</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Zheng</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>A spatio-temporal framework for related topic search in micro-blogging</article-title>
          .
          <source>In Proceedings of the 6th international conference on Active media technology (AMT'10)</source>
          ,
          <string-name>
            <given-names>Aijun</given-names>
            <surname>An</surname>
          </string-name>
          , Pawan Lingras, Sheila Petty, and Runhe Huang (Eds.). Springer-Verlag, Berlin, Heidelberg,
          <fpage>63</fpage>
          -
          <lpage>73</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Nagarajan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gomadam</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sheth</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ranabahu</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mutharaju</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jadhav</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Spatio-temporalthematic analysis of citizen sensor data: Challenges and experiences</article-title>
          .
          <source>Web Information Systems Engineering-WISE</source>
          <year>2009</year>
          pp.
          <fpage>539</fpage>
          -
          <lpage>553</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Farrell</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lau</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wilcox</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Muller</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          “
          <article-title>Socially Augmenting employee profiles with peopletagging,” Proceedings of the 20th annual ACM symposium on User interface software and technology</article-title>
          , Newport, Rhode Island, USA,
          <year>2007</year>
          , pp.
          <fpage>91</fpage>
          -
          <lpage>100</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Braun</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kunzmann</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Schmidt</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>People Tagging &amp; Ontology Maturing: Towards Collaborative Competence Management</article-title>
          . In: David Randall and Pascal Salembier (eds.):
          <source>From CSCW to Web2</source>
          .
          <article-title>0: European Developments in Collaborative Design</article-title>
          ,
          <source>Selected Papers from COOP08</source>
          , Springer, Berlin/Heidelberg.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>