=Paper=
{{Paper
|id=None
|storemode=property
|title=Ins and Outs of News: Twitter as a Real-Time News Analysis Service
|pdfUrl=https://ceur-ws.org/Vol-694/paper2.pdf
|volume=Vol-694
}}
==Ins and Outs of News: Twitter as a Real-Time News Analysis Service==
        Ins and Outs of News: Twitter as a Real-Time News
                         Analysis Service
        Arjumand Younus, M. Atif Qureshi                                          Ahmad Nauman Ghazi
     Department of Computer Science, KAIST,                                     Blekinge Tekniska Högskola,
                 Daejeon, Korea                                                      Ronneby, Sweden
    arjumandms@kaist.ac.kr, atifms@kaist.ac.kr                                     angh08@student.bth.se
     Samina Mumtaz, Muhammad Saeed                                        Nasir Touheed, M. Shahid Qureshi
    Department of Computer Science, Karachi                               Institute of Business Administration,
          University, Karachi, Pakistan                                              Karachi, Pakistan
saeed@uok.edu.pk, saminamumtaz08@gmail.com                            ntouheed@iba.edu.pk, mqureshi@iba.edu.pk
ABSTRACT                                                            information retrieval there is a clear need for the use of
Both time and user popularity play a crucial role within the        approaches that take into account human factors; this is also
domain of news search - the fundamental problem lies in             the case for the news production and presentation process
integrating these two onto a single platform for news               [24]. “Popularity over time” is a crucial human factor
retrieval systems. In this paper we propose techniques for          within the news retrieval and presentation domain.
enhancement of these systems by identifying the news
topics that are popular over a certain period of time. The          Our notion of “popularity over time” with respect to news
notion of “popularity over time” is studied with the help of        services encompasses two concepts: user popularity in the
the well-known real-time microblogging service “Twitter”            form of hot news topics and time which in the case of news
– our method performs linguistic analysis of the news data          services relates to freshness of news. Both these aspects
published daily on news sites for extraction and detection of       play a very significant role in news information retrieval:
news topics that are in high demand using Twitter. We also          popularity is a notion that has long been studied by
present a prototype of our framework which detects popular          information retrieval researchers [14] and recent focus of
news in real-time. The results obtained suggest the need of         researchers in this domain is shifting towards temporal
taking into account users’ interest for effective news              aspects of documents [1] – however what is being ignored
services and strongly imply that harnessing of micro-               in these efforts is a need to integrate these two important
blogging data for this purpose can lead to surprising               concepts which is a significant requirement for the special
outcomes.                                                           case of news information retrieval systems. This paper is a
                                                                    step in this direction and it proposes an approach for
Author Keywords                                                     incorporation of both these concepts.
Twittersphere, online news services, news retrieval                 The goal of this paper is to demonstrate how information
systems, popularity over time                                       from a micro-blogging site can be exploited to study the
                                                                    evolution in news topics’ popularity. We specifically
ACM Classification Keywords                                         demonstrate how to use Twitter to automatically identify
H.3.5 Online Information Services: Web-based services.              the hot news topics that Internet users are most concerned
                                                                    about – furthermore, we do this repetitively in real-time as
1   INTRODUCTION                                                    and when news outlets produce more news. The paper also
The Social Web has introduced a major paradigm shift in             includes the description of our prototypical framework for
the way the World Wide Web is envisioned and this has               identification and detection of popular news topics with
also had a significant effect on the users’ news consumption        time in which we apply natural language processing
patterns –users share their chosen news (simply, a news of          techniques on the news data and integrate our framework
users’ interest or a popular news for users) through the            with the Twitter Search API. At the core of the framework
social media platforms and as such there is an increased            is a visual interface [20] for effective navigation across the
need for effective news services that take into consideration       news data and Tweets’ data which in turn enables us to
the users’ interest in the news [2, 5, 10, 24]. Furthermore,        assess in real-time the popularity of a particular news article
search engines are the primary means through which users            on the Social Web.
access online news [15]. Hence in the domain of news
                                                                1
Additionally through experimental evaluations on real-time       that they use Twitter as a social sensor for emergency
news data we provide empirical evidence supporting the           events such as earthquakes. Shamma et al. [19] analyze
following claims:                                                tweets in the face of live media events and conclude that
                                                                 tweets in the context of any particular media event give
        Twitter acts as an effective news sensor for popular     significant insights into the semantic structure and content
        news topics and items.                                   of the event.
        Twitter’s social graph can give important insights       There is not much work on analysis of Twitter with respect
        into the news that are really meaningful to the          to news data. The only work up to the best of our
        readers.                                                 knowledge is by Sankaranarayanan et al.[18] –
        Twitter acts as a crucial information gathering point    TwitterStand is basically a news processing system built on
        in the face of events that make breaking news.           top of Twitter; it captures tweets corresponding to late
                                                                 breaking news and produces a distributed news wire service
  These findings suggest rich ways in which next-                from these tweets. Our work differs from TwitterStand in
generation news services can utilize users’ interest in the      that we use Twitter as an indicator for assessing news
news to improve the online news experience.                      popularity; moreover our service combines news’ data and
   The rest of the paper is organized as follows. Section 2      tweets’ data on a single platform. This is explained in
discusses related work in this dimension. Section 3 presents     further detail in Section 4.
a discussion of how Twitter is being used as an effective
real-time news delivery service; it also includes results of a   3     OVERVIEW OF EXISTING NEWS SERVICES
user survey we conducted to analyze users’ news sharing          In this section we provide an overview of the existing ways
patterns on Twitter along with a brief summary of existing       of news delivery – at the same time we discuss the role of
news services and their limitations. Section 4 describes our     Twitter as a real-time news service and we also look at
methodology in detail along with an explanation of the           questionnaire data of an online user survey we conducted as
underlying techniques. Section 5 presents the experimental       part of a preliminary study on Twitterers’ perception of
results. Finally Section 6 concludes the paper with              Twitter as a news service.
directions for future research.
                                                                 3.1    Traditional News Services
2   RELATED WORK                                                 The oldest class of services that fall under the umbrella of
There have been extensive studies and analysis on various        traditional services are online versions of newspapers and
user-generated content such as blogs, social networks and        television news with the average monthly reach of web
micro-blogging sites. This work has been carried out in          newspapers among Internet households increasing from
various dimensions. In this section we provide a brief           27.4% in 2004 to 40.9% in 2008 [13], furthermore many
overview of works that are similar to ours along with a          television channels now make news clips available online.
discussion of how our contribution differs from existing         Another traditional news service constitutes search
work.                                                            companies – according to the most recent Pew Research
                                                                 biennial news consumption survey [15] conducted in 2008,
Works on analysis of blogs pursue along the same lines as        Yahoo (28%) and MSN (19%) are the most frequented
we do in this paper by analyzing the link between blogs and      websites among Web news users. Search engines also
news articles thereby extracting and monitoring news trends      aggregate news from many sources at the same time
from the blogosphere [4, 6, 23]. However with the                categorizing them into news categories of World, Business,
emergence of social media and in particular micro-blogging       Sports etc. – these engines also present news clusters across
platforms such as Twitter there has been an increasing trend     multiple sources. Then there are also the services of
of news sharing and searching [21] on these media with           email/mobile news alerts and RSS feeds that allow users to
blogs now serving as a point where users publish their           get news on demand.
reactions to the news and social networking sites have
replaced their role as a news sharing medium.                    An inherent limitation in all the above-mentioned news
                                                                 sources is that there is little or no consideration for the
Due to this recent development in the Internet landscape         notion of “news popularity over time” – the traditional
within the past three years there has been an increasing         services do not take into account the local or user
amount of research on Twitter. Java et al. [9] performed a       preferential buzz factor for the various news items/topics
study on the topological and geographical characteristics of     which is a very significant requirement in today’s rich
Twitter – their findings reveal that people on Twitter use it    media.
as a communication service and connect with people of
similar interests. Hughes et al. [8] examine usage of Twitter    3.2    Exploratory News Services
in the face of crisis situations and their findings imply that   Recently researchers in information retrieval domain are
under emergency events Twitter assumes the role of a             shifting towards exploratory search paradigms for news
significant information broadcasting medium. Sakaki et al.       services which allow users to explore and discover how
[17] also perform a similar study but their work differs in      entities such as people and locations associated with a news
query change over time. Some examples include Yahoo’s               in both dimensions i.e., open-ended and close-ended to
Time Explorer example [12] and Google News Timeline                 maintain the criteria of required information gathering.
[7]; this search service incorporates a time-based clustering       Most respondents (84%) reported reading tweets one or
algorithm which also peers into the future based on                 more times per day. Forty seven (92%) of the respondents
predictions in news articles about the future events. The           agreed that they used Twitter to update themselves about
search results associated with a news query are arranged on         current issues and news and forty two (82%) respondents
a timeline and it allows users to interact with time and            indicated a preference to share happenings around them on
entities in a powerful way.                                         Twitter. Another important indicator used in this
                                                                    questionnaire was the marking on a scale of 1-5 (1
Although such an exploratory news search engine presents
                                                                    indicating lowest preference and 5 indicating highest) for
news in a very intuitive format, we believe it has the same
                                                                    the role of various sites as information-finding places: we
problems as those of traditional news services because there
                                                                    included traditional media i.e., TV channels, newspapers,
is still a strong need to integrate the concept of popularity
                                                                    online news sites and search engines and the new media
with time. Systems like Time Explorer are more focused on
                                                                    i.e., Twitter. The results of this marking yield significant
temporal aspects of news items which will eventually lead
                                                                    insights into role of Twitter as a real-time news service and
to the same problem as in traditional news services.
                                                                    this is what motivates us for use of this service in studying
Furthermore these systems are still in a phase of infancy
                                                                    news popularity over time: 29 (57%) of the respondents
and their use is not widespread till now. We advocate that
                                                                    placed Twitter’s role on a higher or equal scale to that of
such systems will be highly beneficial and effective if they
                                                                    search engines like Google and other news sites; seventeen
incorporate both user popularity and temporal aspects of
                                                                    respondents (33%) placed Twitter lower than other
news documents.
                                                                    traditional media but the surprising aspect is that the
                                                                    difference in marking levels was only by a factor of 1 i.e.,
3.3   Twitter as a Real-Time News Service
                                                                    these users marked traditional media on level 5 and Twitter
Twitter has emerged as a popular micro-blogging service –
                                                                    on level 4. This leads to a very significant observation that
it is structured in such a manner that in addition to being a
                                                                    the role of Twitter as a real-time news service is increasing
micro-blogging service it is also considered to be a social
                                                                    very rapidly. This serves as the basic motivation behind our
network [11]. Twitter users follow others or are followed –
                                                                    approach of utilizing Twitter for a study of news evolution
this relationship requires no reciprocation unlike other
                                                                    and assessment of user popularity regarding the various
online social networking sites such as Facebook or
                                                                    news items.
MySpace. Being a follower implies that a user receives all
the messages called tweets (where each tweet contains a
                                                                    4     STUDYING NEWS EVOLUTION WITH TWITTER
maximum of 140 characters) from those the user follows.
                                                                    In this section we provide a detailed description of our
Other special features of Twitter include the ability to
                                                                    proposed framework for real-time detection of popular
converse with other members of the Twittersphere using the
                                                                    news topics. In Section 4.1 we begin with a discussion of
@ symbol; RT stands for a retweet which refers to the
                                                                    the dimensions along which news retrieval systems align
practice of sharing the same content as that shared by
                                                                    the news data and we mention the dimensions of our focus
another Twitterer – this is also another mode of replying on
                                                                    for the proposed framework. Section 4.2 describes the
Twitter. Another special feature of tweets is that they
                                                                    methodology in detail whereas Section 4.3 focuses on the
sometimes contain the hash symbol “#” referred to as
                                                                    user interface aspects of our prototype. Finally in Section
hashtags. By including a hashtag in a tweet, the user
                                                                    4.4 we discuss our ranking approach for determining the
originating the tweet is suggesting that the word denoted by
                                                                    popular news topics from tweets containing news data; the
the hashtag makes for a good candidate as a search key for
                                                                    ranking function is formulated on the basis of Twitter’s
the tweet.
                                                                    social graph.
Twitter has become a very attractive news on-the-fly
medium; in fact many a times using Twitter one gets                 4.1    Dimensions for a News Retrieval System
information on newsworthy events even before the news               A news retrieval system has to take into consideration news
organizations do [17, 18]. In fact recent research has              data along three dimensions as Figure 1 shows.
demonstrated the use of Twitter as an effective information-
finding medium especially with respect to temporally
relevant information [21].
We conducted an online survey as a preliminary study on
how users of Twitter perceive the role of Twitter as a news
medium in which 51 active Twitter users participated and
this helped us in laying the groundwork for the subsequent
analysis. The questionnaire was designed on a funnel-
shaped pattern to generate the flow of information from
general to specific. The mode of the questionnaire was kept                Figure 1: News Data Dimensions for a News Retrieval System
                                                                3
Time has long been an integral part of search engine                news data gathered from January to August, 2010 and
ranking with most major search engines giving a ranking             the entities with high document frequencies during this
boost for recently published documents particularly in the          interval are part of the common entity corpus;
news domain [12]. Furthermore, news ranking sites use the           additional entities within this common entity corpus
sharing of news items across social networks for assessing          include those entities that are very common in news
the popularity – algorithms like PageRank [14] are also             documents such as names of countries, cities, famous
used for popularity measurement of different news sites.            personalities, government officials etc. The entity
Moreover as we mentioned in Section 3.1 news search                 extractor then sends entities tagged with information of
engines also present news stories as clusters across multiple       which is a common one and which is not is sent to the
sources [24]. However, as we mentioned in Section 1 the             query composer module which uses Boolean query
fundamental problem lies in not integrating these three             predicates for the search task: those entities that are
dimensions onto a single platform for news retrieval                common are used with AND and those not common are
systems. In this paper we consider two of these dimensions          used with OR, if all the entities for a particular news
simultaneously: user popularity and time.                           article are common entities then the news title is used
                                                                    for composition of the query.
4.2 Proposed      Methodology      for   Assessing     News         Twitter Module: This module is responsible for the
Popularity                                                          fetching of the tweets relevant to the news articles. The
In this section we describe our proposed system in detail.          query composer in the news module sends a composed
Figure 2 shows the architecture of our system.                      query to the Twitter Search API: a query corresponding
                                                                    to each news article is sent. The API returns tweets that
                                                                    correspond to each news item; each returned tweet has
                                                                    associated metadata such as timestamp, screen name of
                                                                    user who tweeted it, whether it was in reply to someone
                                                                    etc. After this essential step in this module comes the
                                                                    task of ranking all the news articles – we use tweets
                                                                    extracted per news article for this task i.e., for each
                                                                    tweet relevant to a news article we extract the friends
                                                                    and followers of the user who tweeted it and apply our
                                                                    ranking function to rank each news article with respect
                                                                    to its popularity on Twitter. We explain this ranking
                                                                    function in further detail in Section 4.4.
                                                                    UI Module: this module comprises the user interface
                                                                    part of our system. It takes an input from both the
                                                                    News module and the Twitter module. The News
                                                                    module sends it an unranked list of news articles for
                                                                    each of the four days whereas the Twitter module sends
                                                                    it a ranked list of news articles for each of the four
                                                                    days. This module is further explained in Section 4.3.
              Figure 2: Overview of the System
                                                                Table 1 gives an algorithmic overview of our approach. The
The system is composed of three modules:                        items marked in italics hold special meaning with respect to
    News Module: as shown in the figure this module             the proposed system.
    performs the daily crawl of news articles; the crawler
    [16] basically follows a refresh policy that is each
    day’s newly published news articles are added to the
    archive. Currently the system provides a visual
    interface for displaying news articles of past four days
    only (as a prototype, believing that hot news items are
    generally effective over a period of maximum four
    days). Each news article is parsed into summary, title
    and news text. The entity extractor in the news module
    takes as input the parsed news data – it processes the
    news summary (provided by the news provider) part to
    extract the named entities of each news article using a
    named entity recognition approach [22]. These named
    entities are then matched across the common entity
    corpus on disk – this corpus is created from parsed
Figure 3: Snapshot of application timeline on 29th October, 2010
                             5
1.    Crawl daily news data and send the unranked news articles list to        system explained in Section 4.2 is that it will not
      the UI module of the system.                                             distinguish the tweets disseminated on news articles by
2.    Extract news title, news summary and news text for each news
      article per day.                                                         these news outlets – hence we devise a mechanism to do
3.    From the news summary extract named entities per news article            this using the Twitter social graph. One obvious point to
      through a named entity recognition approach.                             note in the social network of these news and media outlets
4.    Match each named entity across entities in the common entity             is that they have a large number of followers but a
      corpus and tag each named entity per news article as common or
      uncommon.                                                                negligible number of friends i.e., since they are news outlets
5.    Use Boolean query model to compose the query per news article:           many people follow them on Twitter but they follow none.
            a.      Use AND predicate with each common entity and OR           Hence it makes sense to utilize the following to follower
                   predicate with each uncommon entity. If all entities for    ratio for ranking of the news articles. Despite a lot of news
                   a particular news article are common use news title for
                   the query construction.                                     providers tweeting a particular news item, common users
6.    For all articles per day:                                                would also tweet it if the news is really popular among
            a.     Send a request to Twitter Search API and extract result     users. Moreover, if it is broadcasted but users do not take
                   tweets per news article per day                             interest in it then such a news item cannot be a popular
            b.     For each t in result tweet:
                           i. Use t’s metadata to find following and           news item (may be an over projected news). Equation 1
                               follower statistics for each unique Twitterer   shows the formula that we use.
                               who has tweeted about the news article
                          ii. Calculate rank of each article using the                                          followings
                               ranking function (explained in detail in                                                                 (1)
                                                                                     for all Twitterers Of News followers
                               Section 4.4)
7.    Send the ranked news articles list to the UI module of the system
         Table 1: Algorithmic Flow of News Popularity                          5     EXPERIMENTAL EVALUATIONS
                    Ranking using Twitter                                      This section presents the details of the experiments we
                                                                               conducted over a period of 10 days from 17th October to
                                                                               26th October, 2010.
4.3    User Interface
The focus of our prototype system is to detect the                             5.1    Experimental Dataset
popularity of a news topic/article over time and hence at the                  The dataset comprises of the front pages news articles from
core of the user interface is a timeline. Figure 3 displays the                Dawn.com [3], a famous Pakistani news paper’s website
timeline for 29th October, 2010.                                               which publishes news in the English language. We collect
                                                                               their articles on daily basis from 17th October to 26th
As mentioned in Section 4.2 the system shows the timelined
                                                                               October – we then perform the analysis using our system.
news assessment over a period of four days. The data of the
                                                                               For each day’s news data the Twitter module is run at a gap
news articles taken from the original news source is shown
                                                                               of one day hence providing a fair comparison for all the
side by side with the popularity ranks calculated using our
                                                                               days. The statistics collected for each run include the
Twitter module. This system simultaneously addresses the
                                                                               number of tweets per news article per day and the ranking
temporal and popularity aspects within the news domain.
                                                                               scores obtained through equation 1 in Section 4.4.
4.4    Ranking Approach
                                                                               5.2    Experimental Results
In this section we explain our ranking methodology for the
                                                                               Three sets of experimental results are shown as verification
ranking module within the Twitter module of our system –
                                                                               for each claim made in Section 1.
one easy approach could be to count the number of tweets
corresponding to each news article that are retrieved by the                   The first set of experimental results shows the percentage of
Twitter Search API. We argue however that taking into                          news discovered in tweets during the period of our study.
account social network features of Twitter is a more                           Figure 4 shows the results.
effective means of ranking the news articles.
As explained in Section 3.3 there is a follower paradigm in
Twitter – if user a choses to follow user b then user a is a
follower if we look at it from the viewpoint of user b
whereas friend if we look at it from the viewpoint of user a
– this friend is to followers relationship is what we make
use of in our ranking methodology. With the growing
popularity of Twitter as a news media as we explained in
detail in Section 3.3, many news and media outlets have
migrated to Twitter with these channels maintaining their
channels’ Twitter account for news dissemination and
propagation. One potential drawback of our proposed                                    Figure 4: Percentage of News in Tweets per Day
As is obvious from the results in Figure 4 almost daily more            DATE                             EVENT
than 50% of the top news in Pakistan have a mention on the
except for just one day – there was a significant drop in the          17th Oct.                Karachi violence (local)
percentage of news in tweets on 24th October, 2010 and the                th
                                                                       18 Oct.          Hopes fade for trapped Chinese miners
reason for this was found to be due to the appearance of the                                       (international)
news “WikiLeaks makes fresh claim about Iraq deaths”
which shook the entire Twittersphere on an international               19th Oct.      Lakki Marwat suicide attack attempt foiled
level. Only up to a maximum of 67% of news were                                                       (local)
mentioned in tweets although these were news mentioned                 20th Oct.       Rebels raid parliament in Grozny; 7 dead
on the front page of the Dawn.com [3]. This proves our                                              (international)
original claims that traditional news services tend to ignore
user preferences for news consumption and also provide                  21st Oct.           Obama to visit next year (local)
experimental evidence for the claim that Twitter acts as an               nd
                                                                       22 Oct.          Nuclear plant completes decade of good
effective news sensor for popular news topics and items.
                                                                                                  performance (local)
The second set of experimental results shows the maximum
                                                                       23rd Oct.        Ghazi shrine suicide bomber back home
number of tweets recorded on each day for the entire period
                                                                                                        (local)
of the experimental analysis. Figure 5 shows the results.
                                                                       24th Oct.        WikiLeaks makes fresh claim about Iraq
                                                                                                deaths (international)
                                                                       25th Oct.      Court orders Iraqi parliament back to work
                                                                                                    (international)
                                                                       26th Oct.           SCBA presidential election (local)
                                                                     Table 2: Events Corresponding to Highest Tweets in Figure 5
                                                                     The third set of experimental results performed a
                                                                     comparison for proving the effectiveness of the ranking
                                                                     approach based on Twitter’s social graph and our proposed
   Figure 5: Highest Number of Recorded Tweets per Day               hypothesis that following is to follower ratio is extremely
                                                                     low for news outlets. We use Spearman’s footrule which is
As is obvious from the results in Figure 5 there is a large          a widely used measure for evaluating similarity between
fluctuation in the highest number of recorded tweets per             two ordered lists, equation 2 shows the computation of the
day. This fluctuation is explained with the help of Table 2.         Spearman’s footrule for two ordered lists L1 and L2; in the
The table shows the event against which the highest number           formulation i is an element in these lists L1(i) and L2(i)
of tweets were recorded for each day: the days on which              respectively
there was a huge amount of tweets marked a significant
international event that made breaking news i.e., 18th Oct.,               Spearman( L1, L2)            | L1(i) - L2(i) |      (2)
20th Oct. and 24th Oct. – 24th Oct. was also the day in which                                       i
the percentage of news in tweets as shown in Figure 4                We ask a regular follower of news to give us a human-
dropped to 30.8%; this is reasonable because the news for            ranked list of the news items for the 4 days: 21st Oct-24th
which this occurred “WikiLeaks makes fresh claim about               Oct and this ranked list is assumed to be perfect since it was
Iraq deaths” was also heavily tweeted by the Pakistani               produced by a regular Pakistani news reader. This measure
Twittersphere as it also affects them. An important point to         is then used to evaluate the similarity between the ranked
note however is that the Dawn.com news website shows                 lists obtained using number of tweets (naive approach) and
this news quite low in the front page which again verifies           our formulation (equation 1) as a ranking function. Table 3
that traditional news sources have limitations due to not            shows the values of Spearman footrule for both the ranking
taking into account the notion of “popularity over time.”            approaches.
The second set of experimental results further provide
experimental evidence for our claim that Twitter acts as a
crucial information gathering point in the face of events that
make breaking news.
                                                                 7
     DATE           NumTweets              Equation 1           5. Gabrilovich, E., Dumais, S., and Horvitz, E.
                                                                   Newsjunkie: providing personalized newsfeeds via
    21st Oct.             4                     1                  analysis of information novelty. In Proc. WWW 2004,
       nd                                                          ACM Press (2004), 482-490.
    22 Oct.               3                     0
    23rd Oct.             5                     1               6. Glance, N.S., Hurst, M., and Tomokiyo, T., BlogPulse:
                                                                   Automated trend discovery for weblogs. In WWW 2004
    24th Oct.             4                     1                  Workshop on the Weblogging Ecosystem: Aggregation,
                                                                   Analysis and Dynamics, 2004.
     Table 3: Spearman Footrule Values for Both Ranking         7. Google News Timeline
                        Approaches                                 http://newstimeline.googlelabs.com
                                                                8. Hughes, A.L., and Palen, L. Twitter adoption and use in
Table 3 clearly demonstrates that the dissimilarity between        mass convergency and emergency events. In
the human-produced list of news items and the algorithmic          Proceedings of the 6th International Conference on
ranked list was higher when we use number of tweets                Information Systems for Crisis Response and
(naive approach) as ranking criteria. On the other hand the        Management, 2009.
our formulation (equation 1) performs almost similar to the     9. Java, A., Song, X., Finin, T., and Tseng, B., Why we
human produced list and hence proves our experimental              Twitter: understanding microblogging usage and
claim that Twitter’s social graph can give important insights      communities. In Proceedings of the 9th WebKDD and 1st
into the news that are really meaningful to the readers.           SNA-KDD workshop on Web mining and social network
                                                                   analysis, pages 56-65, New York, NY, USA, 2007.
6   CONCLUSIONS
We have described a general framework for integrating the       10. Kamba, T., Bharat, K., and Albers, M. The krakatoa
concepts of user popularity and time within the domain of           chronicle: An interactive, personalized newspaper on
news information retrieval. Our approach is built on top of         the web. In Proc. WWW 1995, ACM Press (1995).
a very popular micro-blogging service which also plays the      11. Kwak, H., Lee, C., Park, H., and Moon, S. What is
role of a social network – furthermore, Twitter provides            Twitter, a social network or a news media? In Proc.
support for both the concepts that are crucial for news             WWW 2010, ACM Press (2010), 591-600.
retrieval systems i.e., popularity and time and we proved its   12. Matthews, M., Tolchinsky, P., Blanco, R., Atserias, J.,
effectiveness through experiments performed on real-time            Mika, P., and Zaragoza, H. Searching through time in
news data. The work also includes a running prototype for           the New York Times. In HCIR ’10: Proceedings of the
real-time detection of hot news topics.                             4th Workshop on Human Computer Interaction and
Future work in this area includes several dimensions: we            Information Retrieval, pages 41-44, New Brunswick,
plan to incorporate the third important dimension that news         NJ, USA, August 2010.
retrieval systems have to deal with i.e., the news sources.     13. Nielson/Net Ratings, MegaPanel Data
Moreover future directions also include refinement
                                                                   http://www.naa.org/TrendsandNumbers/Newspaper-
approaches for the common entity corpus using a learning
                                                                   Websites.aspx
approach which makes it adapt to the changing and
evolving news patterns. We also intend to extend the            14. Page, L., Brin, S., Motwani, R. and Winograd, T. The
prototype for Twitter’s real-time message feed for different        PageRank Citation Ranking: Bringing Order to the Web.
users by building it on top of the Twitter REST API.                Technical report, Stanford Digital Library Technologies
                                                                    Project, 1998.
                                                                15. Pew Research Center for the People and the Press, 2008.
REFERENCES                                                          Key News Audiences Now Blend Online and
1. Alonso, O., Gertz, M., and Baeza-Yates, R. On the value          Traditional Sources – Audience Segments in a Changing
   of temporal information in information retrieval. In             News Environment. Pew Research Center Biennial
   SIGIR Forum, Vol. 41, No. 2, pages 35-41, Dec. 2007.             News       Consumption       Survey.     http://people-
                                                                    press.org/reports/444.pdf
2. Anderson, C.R. and Horvitz, E. Web montage: a
   dynamic personalized start page. In Proc.WWW 2002,           16. Qureshi, M.A., Younus, A., and Rojas, F. Analyzing
   ACM Press (2002), 704-712.                                       Web Crawler as Feed Forward Engine for Efficient
                                                                    Solution to Search Problem in the Minimum Amount of
3. Dawn News Website http://www.dawn.com                            Time through a Distributed Framework. In Proceedings
4. Fukuhara, T., Murayama, T., and Nishida, T. Analyzing            of 1st International Conference on Information Science
   concerns of people using Weblog articles and real world          and Applications, pages 1-8, August, 2010, IEEE
   temporal data. . In Proc. WWW 2005, ACM Press                    Publications.
   (2005), 1-12.
17. Sakaki, T., Okazaki, M., and Matsuo, Y. Earthquake
    shakes Twitter users: real-time event detection by social
    sensors. In Proc. WWW 2010, ACM Press (2010), 851-
    860.
18. Sankaranarayanan, J., Samet, H., Teitler, B.E.,
    Lieberman, M.D. and Sperling, J. TwitterStand: news in
    tweets. In Proceedings of the 17th ACM SIGSPATIAL
    international Conference on Advances in Geographic
    Information Systems, pages 42-51, Seattle, Washington,
    USA, November 2009.
19. Shamma, D., Kennedy, L., and Churchill, E. Tweet the
    debates: Understanding community annotation of
    uncollected sources. In Proceedings of the ACM
    International Conference on Multimedia. ACM, 2009.
20. SIMILE, http://www.simile-widgets.org/timeline/
21. Teevan, J., Ramage, D., and Morris, M.R.
    #TwitterSearch: A comparison of microblog search and
    web search. In Proceedings of the 4th international
    Conference on Web Search and Data Mining
    (WSDM‘11), Hong Kong, China, February 2011.
22. Tjong Kim Sang, E.F., and De Meulder, F. Introduction
    to the CoNLL-2003 shared task: language-independent
    named entity recognition. In Proceedings of the Seventh
    Conference on Natural Language Learning at HLT-
    NAACL 2003 - Volume 4 Human Language Technology
    Conference. Association for Computational Linguistics,
    pages 142-147, Morristown, NJ.
23. Thelwall, M. No place for news in social network web
    sites? In Online Information Review, vol. 32, pages 726-
    744, 2008.
24. Vydiswaran, V.G.V, and Chandrasekar, R. Improving
    the online news experience. In HCIR ’10: Proceedings
    of the 4th Workshop on Human Computer Interaction
    and Information Retrieval, pages 122-125, New
    Brunswick, NJ, USA, August 2010.
                                                                9