=Paper= {{Paper |id=Vol-1542/paper2 |storemode=property |title=Combining Collaborative Filtering and Search Engine into Hybrid News Recommendations |pdfUrl=https://ceur-ws.org/Vol-1542/paper2.pdf |volume=Vol-1542 |authors=Toon De Pessemier,Sam Leroux,Kris Vanhecke,Luc Martens |dblpUrl=https://dblp.org/rec/conf/recsys/PessemierLVM15 }} ==Combining Collaborative Filtering and Search Engine into Hybrid News Recommendations== https://ceur-ws.org/Vol-1542/paper2.pdf
content was GroupLens [14]. GroupLens used collaborative              tent sources of a different nature, such as premium content,
filtering to generate recommendations for Usenet news and             blogs, Twitter, etc. (complete overview), and the clustering
was evaluated by a public trial with users from over a dozen          of content items by topic (clearly structured).
newsgroups. This research identified some important chal-                The remainder of this paper is structured as follows. Sec-
lenges involved in creating a news recommender system.                tion 3 compares the recommendation and content retrieval
   SCENE [15] is such a news service. It stands for a SCal-           problem and indicates resemblances between the two ap-
able two-stage pErsonalized News rEcommendation system.               proaches. Section 4 discusses the architecture of our system
The system considers characteristics such as news content,            and zooms in on the data fetching, search engine, recom-
access patterns, named entities, popularity, and recency of           mender, and clustering component of the proposed system.
news items when performing recommendation. The pro-                   Section 5 provides details on the implementation, the user
posed news selection mechanism demonstrates the impor-                interaction with the system, and the user interface. Finally,
tance of a good balance between user interests, the novelty,          Section 6 draws conclusions.
and diversity of the recommendations.
   The News@hand system [5] is a news recommender which               3.    RECOMMENDATION AS A CONTENT
applies semantic-based technologies to describe and relate
news contents and user preferences in order to produce en-
                                                                            RETRIEVAL PROBLEM
hanced recommendations. This news system ensures multi-                  Content-based algorithms typically compare a represen-
media source applicability. The resultant recommendations             tation of the user profile with (the metadata of) the con-
can be adapted to the current context of interest, thereby            tent, and deliver the best matching items as recommen-
emphasizing the importance of contextualization in the do-            dations [16]. These algorithms often use relatively simple
main of news.                                                         retrieval models, such as keyword matching or the Vector
   In the CLEF NEWSREEL track [3], news recommenda-                   Space Model (VSM) with basic Term Frequency - Inverse
tion techniques could be evaluated in real-time by providing          Document Frequency (TF-IDF) weighting [17]. As such,
news recommendations to actual users that visit commercial            the matching process of content and profile in a content-
news portals. A web-based platform is used to distribute              based algorithm shows many resemblances with the content
recommendations to the users and return users’ impressions            retrieval process of a search engine.
of the recommendations to the researchers.                               Before employing the VSM and TF-IDF weighting in a
   The News Recommender Systems Challenge [22] focused                content-based algorithm, preprocessing of the content is of-
on providing live recommendations for readers of German               ten required. If the content consists of complete sentences,
news media articles. This challenge highlighted why news              the text stream must be broken up into tokens: phrases,
recommendations have not been analyzed as thoroughly as               words, symbols or other meaningful elements. Tokens that
some of the other domains such as movies, books, or mu-               belong together, e.g. United States of America or New York,
sic. Reasons for this include the lack of data sets as well           deserve special attention, and can be handled by reasoning
as the lack of open systems to deploy algorithms in. In the           based on uppercase letters and n-gram models [4]. Before
challenge, the deployed recommenders for generating news              further processing of the content, the next operation is fil-
recommendations are: Recent Recommender (based only on                tering out stop words, the most common words in a lan-
the recency of the articles), Lucene Recommender (a text              guage that typically have a limited intrinsic value. Another
retrieval system built on top of Apache Lucene), Category-            important operation is stemming, the process for reducing
based Recommender (using the article’s category), User Fil-           inflected (or sometimes derived) words to their word stem,
ter (filters out the articles previously observed by the current      or root form. In our implementation, Snowball [20] is used,
user), and Combined Recommender (a stack or cascade of                a powerful stemmer for the English language. Again, a re-
two or more of the above recommenders).                               semblance with content retrieval processes can be noticed,
   The usefulness of retrieval algorithms for content-based           since these preprocessing operations are also performed dur-
recommendations has been demonstrated with experiments                ing the indexing of web pages in search engines.
using a large data set of news content [2]. Binary and graded            Based on these similarities between the content recom-
evaluation were compared and graded evaluation showed                 mendation and content retrieval problem, we opted to utilize
to be intrinsically better for news recommendations. This             a search engine as the core component of our recommender
study emphasizes the potential of combining content-based             service. The user profile is used as search query and pro-
approaches with collaborative filtering into a hybrid recom-          vides the input for the search engine. Consequently, the
mender system for news.                                               search results are the content items best matching the user
   Although the various initiatives emphasize the importance          profile and can therefore be considered as personalized rec-
of a personalized news offer, most of them focus on the rec-          ommendations for the user.
ommendation algorithms. However, the way in which con-                   Utilizing a search engine to generate personalized recom-
tent is gathered, delivered, and presented to end-users is of         mendations for news content brings some additional advan-
crucial importance for a successful service. Users want an            tages.
up-to-date, personalized news offer, providing a complete                  • Short response time. Search engines are strongly opti-
overview of all news events, which is clearly structured and                 mized to quickly identify and retrieve relevant content
classified by topic. In this study, the focus is not on improv-              items. An inverted index [6] is used as a very efficient
ing state of the art recommendation algorithms or search                     structuring of the content, enabling to handle massive
engines, since many studies covered this already [22, 3, 6,                  amounts of documents.
2]. The focus of this paper is rather on investigating the
                                                                           • Fast processing of new content. New content items can
real-time aspect of delivering personalized recommendations
                                                                             be processed quickly by making additions to the in-
(up-to-date content offer       ), the aggregation of multiple con-
                                                                             dex structure, thereby making these new content items
                                                                    time. As opposed to batch processing, Storm handles the
                                                                    news articles as soon as these are available. To use Storm,
                                                                    a topology composed of ‘Spouts’ and ‘Bolts’ has to be built,
                                                                    which describes how messages flow into the system and how
                                                                    they have to be processed. A Spout is a source of data
                                                                    streams. A Bolt consumes any number of data streams, does
                                                                    some processing, and can emit new data streams. Storm can
                                                                    make duplicates of these components, and even distribute
                                                                    these duplicates over multiple machines, in order to process
                                                                    large amounts of data. As a result, Storm makes the system
                                                                    scalable and distributed.
                                                                       In our implementation, the Spouts input data into the sys-
                                                                    tem as URLs of RSS-feeds, blogs, or social network accounts.
                                                                    Storm will distribute the work load over different Bolts of the
                                                                    first type, which fetch the data from the feeds. In case new
                                                                    articles are available in the feed, the URL of these articles
                                                                    is passed to the Bolts of the second type. These Bolts fetch
                                                                    the article content and remove non-topical information, such
                                                                    as advertisements, by identifying specific HTML tags in the
Figure 1: The architecture and content flow of the                  source code of the web page. Subsequently, the Bolts pass
news recommender system.                                            the article content to Bolts of the third type. The task of
                                                                    Bolts of the third type is to analyze the content and obtain
                                                                    information such as the title, date, category, etc. Next, the
       available for recommendation almost immediately. In          article content is passed to the fourth type of Bolts, which
       contrast, traditional recommender systems often re-          will input the news articles into the search engine. After
       quire intensive calculations of similarities before a new    inputting the content into the search engine, statistical in-
       item can be recommended.                                     formation about the article content is stored by the fifth
                                                                    and last type of Bolts. E.g., the frequency of occurrence of
     • Limited storage requirements. The index structure of         a term at a specific moment in time is used to determine if
       search engines is a very efficient storage way to retrieve   a news topic is trending and important (Section 4.3).
       documents.
                                                                    4.2    Search Engine
4.     ARCHITECTURE                                                    In the second phase, the content is processed by a search
  Figure 1 shows the architecture and content flow of the           engine. We opted to use Apache Lucene [24], a Java library
news recommender system. The different components will              that is typically used for services handling large amounts
be discussed in more detail in this section.                        of data and offering search functionalities. Since Lucene’s
                                                                    performance, simplicity, and ease-of-use have been investi-
4.1     Data Fetching                                               gated in related work [12], this research does not focus on
   The first phase of recommendation process is to fetch the        the characteristics of Lucene, but rather on the combination
news content periodically from different sources. When new          of search engine and recommender system.
items are available, their content is fetched and processed.           As alternative search engines, we considered Solr [26] and
Many online news services provide their content through             ElasticSearch [10]. Solr is a ready-to-use, open source search
RSS-feeds. To parse these feeds, the Rome project [28] is           engine based on Lucene. In comparison with Lucene, Solr
used since this is a robust parser. Besides RSS-feeds, other        provides more specific features such as a REST webinterface
sources, such as blogs, can also be incorporated into the           to index and search for documents. However, the disadvan-
system by using a specific content parser.                          tage of Solr is that some of the specialized functionality is
   In order to keep track of the most recent news content,          hidden and not directly usable. Besides, the overhead of the
news sources are checked regularly for new content. Differ-         webinterface of Solr introduced some delay in comparison
ent news sources have a different publishing frequency, rang-       with Lucene in our experiments. Similar to Solr, Elastic-
ing from one news item per day, to multiple news items per          Search hides some of Lucene’s functionality by using a simple
minute. Therefore, we used a simple mechanism to adapt              web interface. Specific information about the content items,
the frequency of checking for new content to the publishing         such as the term frequencies or statistics about the com-
frequency of the content source. For each content source,           plete index, are not directly accessible using ElasticSearch.
a dynamic timer is used to determine when to check for              Therefore, Lucene was chosen to provide the functionality of
new content. After a timeout, the content is fetched. If            the search engine. In case the processing load for the Lucene
new content is available, the content item is added to the          index becomes an issue, distribution over different machines
search engine and the timeout is reduced by half. If no new         is possible by solutions such as Katta [13], thereby making
content is available, the timeout is doubled. This simple           it scalable.
mechanism showed to be sufficient as a convergence method
for the timeout parameter.                                          4.3    Recommender
   In order to process the stream of incoming news articles of        In the third phase, personalized recommendations are gen-
different sources continuously, Apache Storm [1] was used.          erated. The user profile is used as a search query and sent to
Storm enables the processing of large streams of data in real       the search engine. The resulting search results are consid-
ered as personalized recommendations. As is common prac-           mentation will recommend profile terms that are prominent
tice in the VSM [16], the user profile is modeled as a vector      in neighboring profiles. These profile terms of the neighbors
of terms (tags) together with a value specifying the user’s        are used to extend the profile of the user, thereby making it
interest in the term. These terms are words (or N-grams) in        more diverse. Subsequently, this extended profile is used to
the article that are identified as relevant for the content. The   generate content-based recommendations using the search
current implementation is based on the traditional TF-IDF,         engine. By extending the profile of a user with terms that
but alternative solutions can easily be integrated. When the       are significant in the profiles of the user’s neighbors, profiles
user reads a news article, the profile vector is updated with      are broadened and diversified with related terms. These ex-
the TF-IDF values of the terms of the article. However, this       tended profiles will produce more diverse recommendations
update process is only executed if the user has spent more         covering a broad range of topics. Since the additional pro-
time on the article than a predefined threshold. In our im-        file terms are originating from neighbors’ profiles, the added
plementation, we have chosen 10 seconds as a minimum time          terms will probably be in the area of interest of the user.
period for users to read the title and get an impression of        The collaborative filtering component is based on the im-
the article content. More advanced approaches are possible         plementation of Apache Mahout [25]. Mahout ensures the
using the reading time and article length, but these are not       scalability of this component of the system. Moreover, the
always reliable in a mobile environment.                           profile extension is not a time-critical component, and is
   Since our system uses implicit feedback based on users’         therefore implemented as a batch process running period-
selections (see Section 5), the profile update process is a        ically. Content-based recommendations are based on the
simple summation of the item vectors of different articles.        current version of the user profile, and as soon as the pro-
Articles from the past are considered as less representative       file extension is finished, the profile is updated. This en-
for the user’s preferences than recent articles. Therefore,        sures that real-time recommendations can be generated at
the value of a term decreases exponentially as the age (in         all time.
hours) of the article increases, meaning that older items will        Finally, also the publishing date of the article is taken
contribute less to the profile. Although these terms with          into account in the recommendation process. In the current
their corresponding interest values may form a rather long         implementation, only news articles of the last two days are
profile vector, and as a result a long search query, Lucene is     candidate recommendations. However, a more intelligent
designed to handle such search requests in a very short time.      degradation over time, with a degradation rate depending
Therefore, recommendations are requested when needed and           on the category or content of the article, can be future work.
hence always up-to-date.
   News events with a high impact (e.g., a natural disaster in     4.4    Clustering
a remote part of the world) have to be detected and consid-          In the fourth phase, the recommended news items are clus-
ered as a recommendation, even if the topic does not com-          tered into topics. Since the news items in our system origi-
pletely match the user’s interests. These trending topics can      nate from different content sources, multiple items may cover
be identified based on their frequency of occurrence. If the       the same news story. To provide users a clear overview of the
current frequency of occurrence is significantly higher than       news without removing content items, items about the same
the frequency of occurrence in the past, the topic is consid-      topic are clustered together. To cluster the content, three
ered as trending. Besides, trending topics are discovered by       clustering approaches are considered during the design.
checking trends on Google’s search queries [11]. Every hour,         1. A periodic clustering of the complete content library
Google publishes a short list with trending searches. A spe-            before generating recommendations. Traditional clus-
cial Spout was implemented to fetch these trending topics               tering algorithms, which assume that all items are known
hourly. Trending topics are used to create a query for the              before the clustering starts, can be used to periodi-
search engine, and the resulting news items are added to the            cally cluster all news items [23]. This approach does
user’s recommendation list. A final source of trending topics           not allow the recommendation process to begin before
is Twitter. Research has shown that Twitter messages are                the complete clustering of the content library is fin-
a good reflection of topical news [18]. Therefore, another              ished. Since this disadvantage introduces too much
Spout was assigned specifically to query tweets regarding               delay when adding new content to the library, it was
news topics using the Twitter API. Twitter accounts of spe-             not an approach for our system.
cialized news services and newspapers were followed. The
tweets originating from these accounts are focusing on re-           2. An incremental clustering of the content library be-
cent news and characterized by a high quality. Retweets                 fore generating recommendations. In this approach,
and Favorites give an indication of the popularity and im-              new content items are assigned to the best matching
pact of a tweet. Subsequently, Tweets are processed in the              cluster, or a new cluster is made in case there is no
same manner as other news items by Bolts.                               match. Although this clustering approach is used in
   As stated in the introduction, straightforward collabora-            different existing systems [15, 7], we did not opt for
tive filtering is not usable for news recommendations be-               this approach because it is not personalized. For a
cause of the new item problem. Unfortunately, content-                  large content library, a large number of clusters can be
based recommendations are typically characterized by a low              identified. Since the clustering process is performed
serendipity; recommendations are too obvious. To introduce              before the recommendation process, the clusters are
serendipity, a hybrid approach was taken by adding a collab-            identical for all users. However, personal interests may
orative filtering aspect to the content based recommender. A            require a personalized clustering of the news content.
traditional nearest neighbor approach was used to calculate
                                                                     3. A clustering of the recommended content items. This
similarities between user-user pairs. Instead of recommend-
                                                                        is the approach that is used in our system, using a hi-
ing the items that the neighbors have consumed, our imple-
                                                                        erarchical clustering algorithm. Content items are not
     clustered until the recommendation process is finished.
     The advantage of this approach is that only a small set
     of content items (250 candidate recommendations in
     our system) have to be clustered. Another advantage
     of clustering the recommendation results is the person-
     alized nature of this set. For each user, the clustering
     process will result in a different clustering. Even a dif-
     ferent level of clustering (number of clusters) can be
     chosen for every user. Users who are very interested in
     sports may find different clusters for soccer, baseball,
     cycling, etc., whereas users who are moderately inter-
     ested may receive only one sports cluster containing
     all sporting disciplines. On the downside, users may
     not be familiar with a personalized clustering. As user       Figure 2: A screenshot of the user interface of the
     preferences change or as collaborative filtering is ap-       (mobile) web application.
     plied to extend profile vectors, clusters are not stable
     over time. This behavior may surprise users who first
     got used to the existing clusters and then cannot find          Evaluating the system performance in terms of response
     their ‘old favorite’ clusters anymore.                        time gave the following results. A mean response time of 800
                                                                   ms was measured to generate 250 recommendations. This re-
5.   USER INTERACTION                                              quest includes retrieving the user profile and trending terms,
   Mobile has become, especially amongst younger media             executing the query on the search engine, and clustering the
consumers, the first gateway to most news events published         resulting items. These results were obtained on our test sys-
online. In a recent survey [21], conducted in 10 countries         tem, an Intel Xeon E5645 CPU at 2.40 GHz with 8GB of
with high Internet penetration, one-fifth of the users now         RAM running CentOS 6.6.
claim that their mobile phone is the primary access point
for news. The small screen and typical interaction methods         6.   CONCLUSIONS
of mobile devices (touch screen) induce extra challenges and          In this paper, we proposed a hybrid, real-time recom-
possibilities for news services.                                   mender system for news, combining technologies such as
   Because of this, we made our news service available as a        Storm, Lucene, and Mahout to ensure scalability and quick
web application that is usable on desktop but also on tablets      response times. Storm enables the processing of large streams
and smartphones. Figure 2 shows a screenshot of the user           of news content. Lucene provides the functionality of a
interface of the (mobile) web application, based on HTML5          search engine and is used as a content-based recommender.
and Javascript. On the left hand side, an overview of the          The collaborative filter of Mahout is used to exchange pro-
recommended content items is shown. For each article, the          file terms among neighboring users. User profile vectors are
number indicates how many articles covering this topic are         extended with related terms interesting to read about. The
clustered together. Selecting one of the items in the left         resulting hybrid recommendations are clustered according
column will show the article content on the right hand side        to their topic and presented to the user through a web ap-
using an HTML iframe. HTML iframes are used in order               plication that is optimized for mobile devices. This research
to provide all functionality of the source website, such as        discussed the possibility of combining collaborative filtering
hyperlinks, while providing users the ability to browse their      and a search engine to compose a hybrid news recommender
recommendations using the left column. Parsing the content         system, thereby combining the advantages of both. Search
of the source and reproducing it inside our own application        engines ensure a real-time response behavior while collab-
is a technically feasible alternative, but violates the terms      orative filtering adds community knowledge to the system.
of use of many websites. Redirecting the users to the source       As future work, we consider to make a distinction between
website (using hyperlinks) would imply that users leave our        short-term interests and long-term interests of users. We
web application and continue their news consumption on the         also plan to focus more on entities mentioned in articles.
source website, thereby making it impossible to track their
behavior. The user interface is adapted to mobile devices
by providing a clearly readable overview of the content, and       7.   ACKNOWLEDGMENTS
interaction through tapping and swiping the touch screen.             We would like to thank Sam Leroux for the work he per-
For smaller screens, such as smartphones, the column on the        formed in the context of this research during his master the-
left hand side can be hidden to show the news articles in full     sis.
screen. Further optimizations for mobile devices and touch
screens are provided by using JQuery Mobile [27].
   Explicit feedback for news services is difficult to interpret
                                                                   8.   REFERENCES
and therefore less common. E.g., a 1-star on a 5 point rat-         [1] Apache Software Foundation. Apache storm, 2015.
ing scale can be interpreted as a disinterest for the content,          Available at http://storm.apache.org/.
or as sympathizing with a story about some tragic event.            [2] T. Bogers and A. van den Bosch. Comparing and
Therefore, our system is using implicit feedback based on               evaluating information retrieval algorithms for news
the user’s viewing behavior. If an article is selected and              recommendation. In Proceedings of the 2007 ACM
shown on the screen for at least 10 seconds, we assume that             Conference on Recommender Systems, RecSys ’07,
the user has some interest in the topic of the story .                  pages 141–144, New York, NY, USA, 2007. ACM.
 [3] T. Brodt and F. Hopfgartner. Shedding light on a          [17] C. D. Manning, P. Raghavan, H. Schütze, et al.
     living lab: The clef newsreel open recommendation              Introduction to information retrieval, volume 1.
     platform. In Proceedings of the 5th Information                Cambridge university press Cambridge, 2008.
     Interaction in Context Symposium, IIiX ’14, pages         [18] O. Phelan, K. McCarthy, and B. Smyth. Using twitter
     223–226, New York, NY, USA, 2014. ACM.                         to recommend real-time topical news. In Proceedings
 [4] P. F. Brown, P. V. deSouza, R. L. Mercer, V. J. D.             of the Third ACM Conference on Recommender
     Pietra, and J. C. Lai. Class-based n-gram models of            Systems, RecSys ’09, pages 385–388, New York, NY,
     natural language. Comput. Linguist., 18(4):467–479,            USA, 2009. ACM.
     Dec. 1992.                                                [19] L. Pizzato, T. Rej, T. Chung, I. Koprinska, and
 [5] I. Cantador, A. Bellogı́n, and P. Castells. News@hand:         J. Kay. Recon: A reciprocal recommender for online
     A semantic web approach to recommending news. In               dating. In Proceedings of the Fourth ACM Conference
     W. Nejdl, J. Kay, P. Pu, and E. Herder, editors,               on Recommender Systems, RecSys ’10, pages 207–214,
     Adaptive Hypermedia and Adaptive Web-Based                     New York, NY, USA, 2010. ACM.
     Systems, volume 5149 of Lecture Notes in Computer         [20] M. F. Porter. Snowball: A language for stemming
     Science, pages 279–283. Springer Berlin Heidelberg,            algorithms, 2001. Available at
     2008.                                                          http://snowball.tartarus.org/.
 [6] D. Cutting and J. Pedersen. Optimization for dynamic      [21] Reuters Institute for the Study of Journalism. Digital
     inverted index maintenance. In Proceedings of the 13th         News Report, 2015. Available at
     Annual International ACM SIGIR Conference on                   http://www.digitalnewsreport.org/.
     Research and Development in Information Retrieval,        [22] A. Said, A. Bellogı́n, and A. de Vries. News
     SIGIR ’90, pages 405–411, New York, NY, USA, 1990.             recommendation in the wild: Cwi’s recommendation
     ACM.                                                           algorithms in the NRS challenge. In Proceedings of the
 [7] A. S. Das, M. Datar, A. Garg, and S. Rajaram.                  2013 International News Recommender Systems
     Google news personalization: Scalable online                   Workshop and Challenge. NRS, volume 13, 2013.
     collaborative filtering. In Proceedings of the 16th       [23] K. G. Saranya and G. S. Sadhasivam. A personalized
     International Conference on World Wide Web, WWW                online news recommendation system. International
     ’07, pages 271–280, New York, NY, USA, 2007. ACM.              Journal of Computer Applications, 57(18):6–14,
 [8] T. De Pessemier, S. Coppens, K. Geebelen,                      November 2012.
     C. Vleugels, S. Bannier, E. Mannens, K. Vanhecke,         [24] The Apache Software Foundation. Apache Lucene,
     and L. Martens. Collaborative recommendations with             2015. Available at https://lucene.apache.org/.
     content-based filters for cultural activities via a       [25] The Apache Software Foundation. Apache Mahout,
     scalable event distribution platform. Multimedia Tools         2015. Available at http://mahout.apache.org/users/
     and Applications, 58(1):167–213, 2012.                         recommender/recommender-documentation.html.
 [9] T. De Pessemier, C. Courtois, K. Vanhecke,                [26] The Apache Software Foundation. Apache Solr, 2015.
     K. Van Damme, L. Martens, and L. De Marez. A                   Available at http://lucene.apache.org/solr/.
     user-centric evaluation of context-aware                  [27] The jQuery Foundation. jQuery mobile, a
     recommendations for a mobile news service.                     touch-optimized web framework, 2015. Available at
     Multimedia Tools and Applications, pages 1–29, 2015.           http://jquerymobile.com.
[10] Elastic. Elasticsearch, 2015. Available at                [28] M. Woodman. Rome, 2015. Available at https:
     https://www.elastic.co/.                                       //rometools.jira.com/wiki/display/ROME/Home.
[11] Google. Google Hourly Trends, 2015. Available at
     http:
     //www.google.com/trends/hottrends/atom/hourly.
[12] E. Hatcher and O. Gospodnetic. Lucene in action (in
     action series). 2004.
[13] Katta. Lucune & more in the cloud, 2015. Available at
     http://katta.sourceforge.net/.
[14] J. A. Konstan, B. N. Miller, D. Maltz, J. L. Herlocker,
     L. R. Gordon, and J. Riedl. Grouplens: Applying
     collaborative filtering to usenet news. Commun. ACM,
     40(3):77–87, Mar. 1997.
[15] L. Li, D. Wang, T. Li, D. Knox, and B. Padmanabhan.
     Scene: A scalable two-stage personalized news
     recommendation system. In Proceedings of the 34th
     International ACM SIGIR Conference on Research
     and Development in Information Retrieval, SIGIR ’11,
     pages 125–134, New York, NY, USA, 2011. ACM.
[16] P. Lops, M. de Gemmis, and G. Semeraro.
     Content-based recommender systems: State of the art
     and trends. In F. Ricci, L. Rokach, B. Shapira, and
     P. B. Kantor, editors, Recommender Systems
     Handbook, pages 73–105. Springer US, 2011.