<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>OdysseusRecSys: Collaborative Filtering based on a Data Stream Management System</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Cornelius A. Ludmann</string-name>
          <email>cornelius.ludmann@uni-oldenburg.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Grawunder</string-name>
          <email>marco.grawunder@uni-oldenburg.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>H.-Jurgen Appelrath</string-name>
          <email>appelrath@uni-oldenburg.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Oldenburg, Department of Computer Science Escherweg 2</institution>
          ,
          <addr-line>26121 Oldenburg</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The development of algorithms for online Collaborative Filtering (CF) in the past few years enables to add new rating data to existing models. The Recommender System (RecSys) task changes from calculating recommendations from a static and nite dataset to continuously processing rating data. Instead of using stream processing frameworks to implement CF algorithms, we present a prototype that extends the open source Data Stream Management System (DSMS) Odysseus in a generic and domain-independent way. The user can build a custom RecSys that bene ts from existing DSMS features by de ning a continuous query with a declarative query language.</p>
      </abstract>
      <kwd-group>
        <kwd>Collaborative ltering</kwd>
        <kwd>data stream management system</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Instead of using a developer framework for stream-based systems like Apache
Storm, the usage of a generic and application-independent DSMS as a basis for
a stream-based RecSys has the following advantages:
{ Once the CF feature is implemented in the DSMS, users can use this function
in their queries. They can build a customized RecSys without the need of
writing code.
{ A lot of operations needed for a RecSys can be covered by existing operators.</p>
      <p>This includes the access to data sources and sinks, the calculation and
aggregation of values like the model error for the RecSys evaluation, the choice
of the top-K recommendations etc.
{ A RecSys can be enhanced by adding additional operators to the query. This
can be useful to combine di erent data sources, to pre- and post-process data
(e. g., normalization of input data), to include and process additional data
(e. g., context data or open linked data), to combine di erent model learners
to an ensemble learner etc.
{ The RecSys developer can bene t from many DSMS features like query
plan optimization, query sharing, fragmentation and distribution of queries,
scheduling etc.
2</p>
      <p>Architecture Overview of OdysseusRecSys
A stream-based RecSys interacts with its environment as follows: First, the user
application transmits rating data2 (u; i; r) for a user u that rates an item i with
rating r. Second, the user application transmits requests for recommendations
for a user u (e. g., when the user opens the recommendations page). Third, the
RecSys sends sets of recommendations back to the user application. Additionally,
the input of a RecSys can be extended by a user-feedback. In the following, we
expect the feedback in the form of new rating data. Furthermore, a continuous
evaluation of a RecSys leads to an output of a model error. This can be plotted
in a dashboard to monitor the RecSys or written to a le.</p>
      <p>To write queries for CF, we added CF-speci c logical operators to achieve an
additional abstraction level. Figure 1 gives an overview of these operators and
their combination in a simple logical query plan. On the left we see the incoming
and on the right the outgoing data streams as described above. The logical
operators and the intermediate data ow (solid lines with arrows) are depicted in
between. The CF operators are distinguished between learning, recommending,
and evaluating operators. The learning operators get learning tuples and build
models for the recommendation and evaluation. The recommending operators
use the models to generate a list of recommendations every time a request for
recommendations arrives. The evaluating operators get test data and models to
calculate a stream of error values for the predicted ratings of the test tuples.
2 In the following, we assume either an explicit rating or an implicit rating that
indicates a usage (e. g., binary rating). Alternatively, our concept can be extended to
calculate implicit ratings by the usage behaviour of the users.</p>
      <p>OdysseusRecSys: Collaborative Filtering based on a DSMS
RecRoemqumeesntsdafotirons</p>
      <p>Extract Test Data
Ratings Evaluating</p>
      <p>Test Data
Learning Data</p>
      <p>Predicted</p>
      <p>Test Data</p>
      <p>Updated Models
Predict Rating</p>
      <p>Test Prediction</p>
      <p>Window
Learning</p>
      <sec id="sec-1-1">
        <title>Learning Data Train RecSys Model</title>
        <p>Updated Models</p>
      </sec>
      <sec id="sec-1-2">
        <title>Recomm. Candidates Recommend.</title>
        <p>Recommending Candidates</p>
        <p>Predict Rating</p>
        <p>Recommend
Predicted
Candidates</p>
        <p>Model
Errors</p>
        <p>Set of
Recommend.</p>
        <p>Feedback
This generic composition can be used as a template and can be adapted and
extended for the particular application. The logical CF operators are:
Window: We use a standard time-based window operator that allows to limit the
validity of the rating data. This can be useful to learn a model that focuses only
on the newest ratings and removes outdated ratings (to handle concept drifts).
Additionally, it limits the ratings that need to be held in memory. This operator
has to be removed if all ratings should be held in memory forever.
Train RecSys Model: This operator uses rating data to build a CF model. It
outputs updated models as data stream elements when new rating data arrives.
The learning algorithms can respect temporal dynamics. This operator can use
di erent learning algorithms con gured by parameters.</p>
        <p>Recomm. Candidates: For each request for recommendations of a user, this
operator determines a set of recommendation candidates. These are usually all
items that have not been rated by the user.</p>
        <p>Predict Rating: This operator uses the models to predict the rating score for
each recommendation candidate resp. for each test tuple. It ensures a temporal
matching of models and recommendation candidates resp. test tuples. This allows
a deterministic matching of models.</p>
        <p>Recommend: This operator chooses the items that should be recommended.
These are usually the top-K items.</p>
        <p>
          Extract Test Data: To evaluate a CF algorithm, this operator outputs test data.
A simple implementation routes 10 % of the learning data as test data (hold-out).
An alternative is Interleaved Test-Then-Train (ITTT) [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
        <p>Test Prediction: This operator implements an evaluation metric, e. g., RMSE.
It compares the predicted and the true rating resp. the predicted and the true
ranking position and aggregates an overall or moving average.</p>
        <p>
          Odysseus is designed to be modular and extendable. It supports an editor
for the SQL-like query language CQL [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], the functional query language PQL
[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], and a graphical query editor, as well as a dashboard to visualize the data
and a GUI to control the DSMS. Additionally, developers can add new (domain
speci c) languages, new operators, new data source and sink connectors, and
new dashboard parts by the use of the Odysseus framework.
        </p>
        <p>We added the operators to Odysseus (Fig. 2), which allows to use them in
arbitrary queries. New transformation rules translate them to existing physical
operators. Exceptions are new physical operators (1) for model learning that
implements arbitrary learning algorithms, (2) to determine the recommendation
candidates, and (3) to predict rating scores by the use of the RecSys models.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Preliminary Evaluations</title>
      <p>
        We evaluated the feasibility by comparing the RMSE for the MovieLens dataset
with the results of the stream mining framework Massive Online Analysis (MOA).
To make our results comparable, we added the MOA implementation of the
BRISMF algorithm [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], removed the window operator, and calculated the
overall RMSE after each tested tuple with ITTT. The error values are exactly the
same as those of MOA, which shows that the matching of learning data, models,
and test data as well as the implementation of the evaluation operate correctly.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Appelrath</surname>
            ,
            <given-names>H.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Geesen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grawunder</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michelsen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nicklas</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Odysseus: A highly customizable framework for creating e cient event stream management systems</article-title>
          .
          <source>In: DEBS'12</source>
          . pp.
          <volume>367</volume>
          {
          <fpage>368</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Arasu</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Babu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Widom</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>The CQL continuous query language: semantic foundations and query execution</article-title>
          .
          <source>The VLDB Journal</source>
          <volume>15</volume>
          (
          <issue>2</issue>
          ),
          <volume>121</volume>
          {
          <fpage>142</fpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Diaz-Aviles</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Drumond</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidt-Thieme</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nejdl</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Real-time top-n recommendation in social streams</article-title>
          .
          <source>In: ACM RecSys</source>
          . pp.
          <volume>59</volume>
          {
          <fpage>66</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Gama</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zliobaite</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Biefet</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pechenizkiy</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bouchachia</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>A Survey on Concept Drift Adaptation</article-title>
          .
          <source>ACM Comp. Surveys</source>
          <volume>1</volume>
          (
          <issue>1</issue>
          ) (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Takacs</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pilaszy</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nemeth</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tikk</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Scalable collaborative ltering approaches for large recommender systems</article-title>
          .
          <source>JMLR 10</source>
          ,
          <issue>623</issue>
          {
          <fpage>656</fpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>