<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Signal-Based Approach to News Recommendation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sirian Caldarelli</string-name>
          <email>sirian.caldarelli@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Davide Feltoni Gurini</string-name>
          <email>feltoni@dia.uniroma3.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Micarelli</string-name>
          <email>micarel@dia.uniroma3.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giuseppe Sansonetti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Engineering Roma Tre University Via della Vasca Navale 79 Rome</institution>
          ,
          <addr-line>00146</addr-line>
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <abstract>
        <p>In this paper, we describe our research activity on an approach to personalized news recommendation, which captures the temporal dynamics of the active user's interests. In such recommender, the user pro le explicitly involves the time dimension in representing her interests and preferences. Each user's interest is represented as a signal, thus characterizing its evolution over time. To this aim, a signal processing technique (i.e., the discrete wavelet transform) is adopted to represent and analyze such signals. Furthermore, we report the experimental results of a very preliminary comparative evaluation on an online available dataset. Such results seem encouraging, thus spurring us to continue developing our approach. News recommendation; user pro ling; temporal dynamics</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        With the development in electronics and Internet
technologies, online information available has been constantly
increasing. In such scenario, users are confused and more
and more feel the need to be guided in the selection of the
information to pay attention to. News recommenders are
a possible solution, since help users nd the information of
possible interest to them. In order to provide personalized
suggestions, such systems rely on a representation of the
target user's interests and preferences. A vast amount of user
pro ling techniques have been proposed and deeply
evaluated [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. However, representing how users' interests evolve
over time remains a di cult challenge. In this paper, we
apply an approach to user pro ling, called bag-of-signals [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
whose aim is to represent the diversity and time-dependent
evolving nature of users' interests. Based on such approach,
we realized a recommender system of news articles. In
order to assess its performance, we performed a very
preliminary o -line evaluation as follows. Starting from a public
database, we built users pro les extracting their interests
from news articles linked to contents generated by them on
social media. More speci cally, we examined users' timelines
on Twitter 1 considering all the tweets and the related news
articles in the entire observation period. Then, we extracted
users' interests as concepts (e.g., topics) from those news and
represented their evolution over time as signals. For
analyzing and comparing such signals, we made use of a signal
processing tool that characterizes the frequency content of
any signal, along with its accurate location in the time
domain. A comparative evaluation with a classic approach that
completely ignores the time-dependence of users' interests
revealed the bene ts of the proposed news recommender.
2.
      </p>
    </sec>
    <sec id="sec-2">
      <title>BAG-OF-SIGNALS MODEL</title>
      <p>The representation of users' interests as signals requires
some de nitions. We de ne pseudo-document related to a
user u 2 U (with U set of all the users) and an observation
period T , the set of all the news articles mentioned by u
in the period T :
P seudoDoc(u; T ) = fnewsj user(news) = u; date(news) 2
T g
The notation user(news) = u means that the user u has
mentioned that particular news, while date(news) 2 T
means that u has mentioned that news in the period T .
An extension of the bag-of-words representation, well-known
in Information Retrieval, is the bag-of-concepts model, where
concepts instead of keywords are extracted from
pseudodocuments. Concepts are entities more semantically
significant than simple keywords. We de ne bag-of-concepts user
model the following set of weighted concepts:</p>
      <p>PBoC (u) = fc; w(u; c)jc 2 C; u 2 U g
where the function w(u; c) gives the weight of the concept
c 2 C for the user u 2 U (with C and U set of concepts
and users, respectively). Then, we de ne pseudo-fragment
related to a user u 2 U in an interval t 2 T , the set of
all the news mentioned by u in the interval t:
P seudoF rag(u; t) = fnewsj user(news) = u; date(news) 2 tg
By analyzing a single pseudo-fragment related to an interval
t, it is possible to determine the signal components for the
concepts in the text fragment. A signal component fu;c; t
related to a user u 2 U , a concept c 2 C, and an interval
t 2 T , is determined by the number of times the concept
c occurs in the pseudo-fragment P seudoF rag(u; t), based
on the weighting function !(u; c; t)</p>
      <p>fu;c; t = !(u; c; t)
ti 2
This function is used to reduce the impact of typical
problems of Information Retrieval, which may a ect the proposed
model too. More speci cally, !(u; c; t) takes into account
(i) the discriminating power of the concept c within the time
interval t, and (ii) the relevance of the same concept within
the user u's pro le. We de ne signal Su;c related to a user u
and a concept c the ordered set of signal components fu;c; ti
with T</p>
      <p>
        Su;c = [fu;c; t1 ; fu;c; t2 ;
; fu;c; tn ]
where T consists of n consecutive and same length
intervals ti (with i = 1; 2; :::n). As seen in the
bag-ofconcepts model, a user is represented through a set of
concepts weighted according to their occurrences within the
pseudo-document. In the proposed model, a user is
represented by a set of signals related to several concepts that
appear in the pseudo-fragments concerning the user.
Furthermore, each signal is made up of an ordered set of signal
components weighted according to the weighting function.
Now, we de ne the bag-of-signals model of user u 2 U as the
set of the signals related to the user u, where the components
fu;c; t are determined by the weighting function !(u; c; t):
PBoS(u) = fSu;c = [fu;c; t1 ; fu;c; t2 ;
; fu;c; tn ] j c 2 Cg
Each signal contains two di erent information related to the
concept: temporal and quantitative. Hence, the elementary
units of bag-of-signal representation are signals and
therefore they are the starting point for assessing the
similarity between users. These signals show strong
discontinuities and sharp spikes. Signal processing provides an ideal
tool for representing and analyzing such kind of signals: the
wavelet transform [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Wavelets are mathematical functions
that may be located both in time (space), as well as in scale
(frequency), thus providing an accurate time-scale map of
the signal. The wavelet-based analysis relies on the use of a
prototype function, so-called mother wavelet, whose
translated and scaled versions constitute the basis functions for
the series expansion that ensures the representation of the
original signal through coe cients. Operations involving
signals can, therefore, be developed - in a more
streamlined and e cient way - directly on corresponding wavelet
coe cients. If the mother wavelet is properly selected (in
our approach we choose the Haar wavelet for its compact
support, as can be seen from Figure 1), the wavelet
transform allows for best capturing signal dynamics.
Computation of the wavelet transform can be performed in a fast
way (with computational cost O(n), if n is the number of
signal samples) by means of the fast discrete wavelet
transform (DWT) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Preliminary attempts of leveraging the
wavelet theory for music and movies recommendation tasks
have been proposed [
        <xref ref-type="bibr" rid="ref3 ref4">4, 3</xref>
        ]. Once de ned the bag-of-signals
model for representing user pro les, we also need to de ne
a method for evaluating the similarity between users.
Concretely, we considered two di erent similarity functions f 1
and f 2.
      </p>
      <p>Given two users u1, u2 and their corresponding pro les
PBoS(u1), PBoS(u2) based on the bag-of-signals
representation, the similarity function f1 between those users is de ned
as follows:
f 1(u1; u2) =</p>
      <p>Pc2C1\C2 (su1;c) (su2;c) templevel(su1;c; su2;c)
qP</p>
      <p>c2C1 2(su1;c) qPc2C2 2(su2;c)
where su1;c 2 PBoS(u1) and su2;c 2 PBoS(u2), C1 and C2
are the sets of the concepts related to the signals
belonging to PBoS(u1) and PBoS(u2), the function (s) expresses
the energy of the signal s and templevel(s1; s2) is a function
that analyzes whether the signals s1 and s2 show similar
time use patterns. The importance of a signal within the
pro le is given by its energy. Given a discrete-time signal s,
limited and with real components, its energy (s) is de ned
as follows:
(s) =
jsj
X s[i]2
i=0
The function templevel returns a value between 0 and 1,
providing a measure of how much the concepts belonging to the
two pro les have been used with similar time patterns. In
this way, the contribution of two concepts used in the same
intervals will be greater than the contribution of the
concepts used in di erent intervals. The approximation Al(s)
of the signal s at level l-th is de ned by the set of
approximation coe cients of the DWT limited to the level l-th:</p>
      <p>Al(s) = fal;j j = 1; :::; 2lg
Given two signals s1 and s2 and their respective
approximations at level Alevel(s1) = [as1 ; :::; as1 ] and Alevel(s2) =
[as2 ; :::; as2 ], the function templevel(s1; s2) is de ned as
follows:
templevel(s1; s2) =</p>
      <p>C(s1; s2)
pC(s1; s1)C(s2; s2)
where</p>
      <p>C(s1; s2) =
j2lj
XAlevel(s1)[i]Alevel(s2)[i]
i=0</p>
      <p>Given two users u1, u2 and their respective user pro les
PBoS(u1) and PBoS(u2) based on the bag-of-signals
representation, the similarity function f2 between those users is
de ned as follows:
f 2(u1; u2) =</p>
      <p>P
c2C1\C2</p>
      <p>P su1;c[i] su2;c[i]
r P
c2C1</p>
      <p>P su1;c[i]2 r P
c2C2</p>
      <p>P su2;c[i]2
where su1;c 2 PBoS(u1) and su2;c 2 PBoS(u2), C1 and C2
are the sets of the concepts related to the signals belonging
to PBoS(u1) and PBoS(u2).</p>
    </sec>
    <sec id="sec-3">
      <title>EXPERIMENTAL EVALUATION</title>
      <p>
        In order to perform our experimental tests, we resorted to
the dataset presented and employed in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Such dataset was
obtained by monitoring a sample of 20,000 English
speaking users' timelines on Twitter for a given time period T .
From the original sample, the authors selected only those
1619 users that posted at least ten tweets at month and at
least 20 tweets in the whole observation period, thus
gathering more than two million tweets. From the news
articles mentioned in such tweets, concepts (i.e., entities, types,
and topics) were extracted through the web service
OpenCalais 2. We associated such concepts to the creation time
of the corresponding tweet, in order to temporally localize
them. The whole observation period T was about three
months, so we considered the tweets (and the linked news)
of the rst two months as training dataset, the remaining
tweets as testing dataset. After that, the evaluation
procedure was as follows (see Figure 2).
      </p>
      <p>Training phase
the news linked to the tweets belonging to the training
dataset were retrieved;
the concepts extracted from such news were
considered;
a bag-of-signals pro le was built for each user, using
the concepts obtained in the previous step;
for each user a list of users more similar to her was
returned.</p>
      <p>
        Testing phase
the news linked to the tweets belonging to the testing
dataset were retrieved;
a pseudo-document for each user was generated from
those news;
all the pseudo-documents were indexed using the open
source Lucene platform 3, as proposed in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ];
for each pseudo-document a list of pseudo-documents
more similar to it was returned.
2http://www.opencalais.com/
3https://lucene.apache.org/
The performance of the recommender system was assessed
in terms of the normalized version of Discounted Cumulative
Gain (nDCG) [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ]. nDCG is usually truncated at a
particular rank level to emphasize the importance of the rst
retrieved documents. The measure is de ned as follows:
DCG@n
nDCG@n = (1)
      </p>
      <p>
        IDCG@n
and the Discounted Cumulative Gain (DCG) is de ned as
follows:
where reli is the graded relevance of the i th result (i.e., 0
= non-signi cant, 1 = signi cant, and 2 = very signi cant ),
and the Ideal DCG (IDCG) for a query corresponds to the
DCG measure where scores are resorted monotonically
decreasing, that is, the maximum possible DCG value over
that query. nDCG is often used to evaluate search engine
algorithms and other techniques whose goal is to order a
subset of items in such a way that highly relevant documents are
placed on the top of the list, while less important ones are
moved lower. Basically, higher values of nDCG mean that
the system output gets closer to the ideal ranked output.
Figure 3 shows the experimental results obtained
considering the two similarity functions f 1 and f 2 introduced above,
and the function S1 proposed in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], which was obtained by
indexing the contents of all the news articles using Lucene.
It is possible to notice that the rst two approaches, which
consider the evolution of interests over time, outperform the
last one that, instead, ignores the temporal dimension.
Figure 4 reports the best results (i.e., those obtained through
the f 1 similarity function) when varying the nature of the
concepts represented as signals in the user pro le. As we
could expect, bag-of-signals user pro les representing
entities as signals allow the news recommender to obtain the
best performance. In fact, the maximum number for topics
and types extracted by OpenCalais is 18 and 39,
respectively. On the contrary, there is no limit for the number of
entities extracted from news articles. In the used dataset, a
bag-of-signals user pro le with entities as signals can have
more than 3500 represented concepts. Hence, the smaller
amount of information in case of topics and types brought
about worse results than those obtaining using entities.
      </p>
    </sec>
    <sec id="sec-4">
      <title>CONCLUSIONS</title>
      <p>In this paper, we have presented a news recommender
system based on the bag-of-signals user model, which leverages
signal processing techniques to represent not only the
number of occurrences of the informative entities (concepts), but
also the related time use patterns. The bag-of-signals user
model involves modeling the user interests through a set
of signals and the adoption of similarity functions suitably
de ned. More speci cally, for the signal analysis and
representation we employ the wavelet mathematical tool for its
main characteristic of time-frequency localization.
Practically, the discrete wavelet transform allows us to e ectively
analyze the sampled signals with a di erent time window.</p>
      <p>Although the experimental results on an online available
dataset are positive, this work is still in a preliminary stage
and leaves much space for future developments. For
instance, the similarity function is an open issue that should be
further investigated. Starting from the bag-of-signals model,
we could explore new functions considering the same data
but in a di erent way, developing new aspects, and using
other tools from the signal processing domain. Moreover, we
intend to test our news recommender on real news datasets.
Finally, another interesting development could involve
sentiment analysis. Concretely, we propose to add a further
module to the described news recommender, whereby
extract the positive, negative, or neutral opinion expressed by
the user about a given concept. In this way, the pro le may
take into account not only the level and the temporal
localization of users' interests, but also their nature.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>F.</given-names>
            <surname>Abel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.-J.</given-names>
            <surname>Houben</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Tao</surname>
          </string-name>
          .
          <article-title>Analyzing temporal dynamics in twitter pro les for personalized recommendations in the social web</article-title>
          .
          <source>In Proceedings of the 3rd International Web Science Conference, WebSci '11. ACM</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G.</given-names>
            <surname>Arru</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. Feltoni</given-names>
            <surname>Gurini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Gasparetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Micarelli</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Sansonetti</surname>
          </string-name>
          .
          <article-title>Signal-based user recommendation on twitter</article-title>
          .
          <source>In Proceedings of the 22Nd International Conference on World Wide Web, WWW '13 Companion</source>
          , pages
          <volume>941</volume>
          {
          <fpage>944</fpage>
          , New York, NY, USA,
          <year>2013</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Biancalana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Gasparetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Micarelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Miola</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Sansonetti</surname>
          </string-name>
          .
          <article-title>Context-aware movie recommendation based on signal processing and machine learning</article-title>
          .
          <source>In Proceedings of the 2nd Challenge on Context-Aware Movie Recommendation</source>
          ,
          <source>CAMRa '11</source>
          , pages
          <fpage>5</fpage>
          {
          <fpage>10</fpage>
          , New York, NY, USA,
          <year>2011</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>F.</given-names>
            <surname>Gasparetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Biancalana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Micarelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Miola</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Sansonetti</surname>
          </string-name>
          .
          <article-title>Wavelet-based music recommendation</article-title>
          . In K.
          <string-name>
            <surname>-H. Krempels</surname>
          </string-name>
          and J. Cordeiro, editors,
          <source>WEBIST 2012 - Proceedings of the 8th International Conference on Web Information Systems and Technologies</source>
          , Porto, Portugal,
          <fpage>18</fpage>
          -
          <lpage>21</lpage>
          April,
          <year>2012</year>
          , pages
          <fpage>399</fpage>
          {
          <fpage>402</fpage>
          .
          <string-name>
            <surname>SciTePress</surname>
          </string-name>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Graps</surname>
          </string-name>
          .
          <article-title>An introduction to wavelets</article-title>
          .
          <source>IEEE Computational Science and Engineering</source>
          ,
          <volume>2</volume>
          (
          <issue>2</issue>
          ),
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Hannon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bennett</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Smyth</surname>
          </string-name>
          .
          <article-title>Recommending twitter users to follow using content and collaborative ltering approaches</article-title>
          .
          <source>In Proceedings of the fourth ACM Conference on Recommender Systems, RecSys '10</source>
          , pages
          <fpage>199</fpage>
          {
          <fpage>206</fpage>
          . ACM,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Harandi</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Gulla</surname>
          </string-name>
          .
          <article-title>Survey of user pro ling in news recommender systems</article-title>
          . In J. A.
          <string-name>
            <surname>Gulla</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>O</given-names>
          </string-name>
          <string-name>
            <surname></surname>
          </string-name>
          . Ozgobek, and N. Shabib, editors,
          <source>Proceedings of the 3rd International Workshop on News Recommendation and Analytics (INRA</source>
          <year>2015</year>
          )
          <article-title>co-located with 9th ACM Conference on Recommender Systems (RecSys</article-title>
          <year>2015</year>
          ), Vienna, Austria,
          <year>September 20</year>
          ,
          <year>2015</year>
          ., volume
          <volume>1542</volume>
          <source>of CEUR Workshop Proceedings</source>
          , pages
          <volume>20</volume>
          {
          <fpage>26</fpage>
          . CEUR-WS.org,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>K.</given-names>
            <surname>Ja</surname>
          </string-name>
          <article-title>rvelin and</article-title>
          <string-name>
            <surname>J. Keka</surname>
          </string-name>
          <article-title>lainen. IR evaluation methods for retrieving highly relevant documents</article-title>
          .
          <source>In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '00</source>
          , pages
          <fpage>41</fpage>
          {
          <fpage>48</fpage>
          , New York, NY, USA,
          <year>2000</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>K.</given-names>
            <surname>Ja</surname>
          </string-name>
          <article-title>rvelin and</article-title>
          <string-name>
            <surname>J. Keka</surname>
          </string-name>
          <article-title>lainen. Cumulated gain-based evaluation of IR techniques</article-title>
          .
          <source>ACM Transactions on Information Systems</source>
          ,
          <volume>20</volume>
          (
          <issue>4</issue>
          ):
          <volume>422</volume>
          {
          <fpage>446</fpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S. G.</given-names>
            <surname>Mallat</surname>
          </string-name>
          .
          <article-title>A theory for multiresolution signal decomposition: The wavelet representation</article-title>
          .
          <source>IEEE Trans, on Pattern Analysis and Machine Intelligence</source>
          , PAMI-
          <volume>11</volume>
          (
          <issue>7</issue>
          ):
          <volume>674</volume>
          {
          <fpage>693</fpage>
          ,
          <year>July 1989</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>