=Paper= {{Paper |id=None |storemode=property |title=Open Source Recommendation Systems for Mobile Application |pdfUrl=https://ceur-ws.org/Vol-676/paper9.pdf |volume=Vol-676 |dblpUrl=https://dblp.org/rec/conf/recsys/SouzaCK10 }} ==Open Source Recommendation Systems for Mobile Application== https://ceur-ws.org/Vol-676/paper9.pdf
           Open Source Recommendation Systems for Mobile
                            Application

                  Renata Ghisloti De                                        Raja Chiky                       Zakia Kazi Aoul
                       Souza                                             LISITE-ISEP                           LISITE-ISEP
                      LISITE-ISEP                                   28 rue Notre Dame Des                 28 rue Notre Dame Des
                 28 rue Notre Dame Des                                     Champs                                Champs
                        Champs                                            75006 Paris                           75006 Paris
                       75006 Paris                                   raja.chiky@isep.fr                    zakia.kazi@isep.fr
               renata.ghisloti@isep.fr

ABSTRACT                                                                             area. These systems intend to help users by providing use-
The aim of Recommender Systems is to suggest useful items                            ful suggestions to them. They may suggest items in differ-
to users. Three major techniques can be highlighted in these                         ent manners, such as comparing the user taste with other
systems: Collaborative Filtering, Content-Based Filtering                            users tastes or comparing the users preferences with other
and Hybrid Filtering. The collaborative method proposes                              items definitions. These two methods are the so called col-
recommendations based on what a group of users have en-                              laborative filtering [1] and content-based filtering [10]. The
joyed and it is widely used in Open Source Recommender                               collaborative method presents advantages over the content-
Systems. The work presented in this paper takes place in                             based one. It is more efficient in practice and simpler to
the context of SoliMobile Project that aims to design, build                         implement. Due to this fact, the majority of open source
and implement a package of innovative services focused on                            projects choose it. Current open source recommender sys-
the individual in unstable situation (unemployment, home-                            tem projects are usually built on the item-based approach,
less, etc.). In this paper, we present a study of open source                        a type of collaborative filtering. Their features vary on the
recommender systems and their usefulness for SoliMobile.                             programming language, extent of documentation and mag-
The paper also presents how our recommender system is fed                            nitude of the project.
by extracting implicit ratings using the techniques of Web                              We give in this paper an overview on known recommen-
Usage Mining.                                                                        dation techniques and we analyze open source projects in
                                                                                     this field of research. Our interest of recommender systems
                                                                                     is justified by the fact that we have to choose one of the
Categories and Subject Descriptors                                                   studied systems and to integrate it in a complex platform
H.3.4 [Systems and Software]: Performance evaluation                                 that includes a Web platform, a personalization system and
(efficiency and effectiveness); H.2.8 [Database applica-                             a mobile interface. This platform is developed through the
tions]: Data mining—Web usage mining                                                 SoliMobile project, funded by ProximaMobile [12].
                                                                                        The SoliMobile project in which we are involved, aims at
                                                                                     providing a portal services helping and assisting people who
General Terms                                                                        are in different unstable situations. This project provides
Algorithms, Experimentation, Theory                                                  end users with information to facilitate the process to access
                                                                                     to charities services from anywhere. The portal has to offer
                                                                                     services adapted to each user profile, taking into account
Keywords                                                                             their preferences and navigation traces. Our work aims to
Open source recommender systems, collaborative filtering,                            provide the user with a recommendation of items (services)
Mahout, Web usage mining                                                             based on the profile. The recommendation’s main function
                                                                                     is to aggregate content from different sources and mobile
                                                                                     Web portal and to customize the presentation of services
1.     INTRODUCTION                                                                  for each user according to his profile. It allows classification
  The amount of information in the web has greatly in-                               or restriction of services into a selection that fits the user
creased in the past decade. This phenomenon has pro-                                 profile.
moted the advance of the recommender systems research                                   The remainder of this paper is organized as follows. Sec-
                                                                                     tion 2 describes the context of the work presented in this
                                                                                     paper. In Section 3 , we present the global architecture
Permission to make digital or hard copies of all or part of this work for            of the SoliMobile Project. Section 4 details the analysis of
personal or classroom use is granted without fee provided that copies are            existing Open Source Recommender Systems. The recom-
not made or distributed for profit or commercial advantage and that copies           mender system used in the project is explained in section 5.
bear this notice and the full citation on the first page. To copy otherwise, to      Section 6 presents the utility of web usage mining in the rec-
republish, to post on servers or to redistribute to lists, requires prior specific   ommendation. Finally, Section 7 concludes this paper and
permission and/or a fee.                                                             gives an outlook upon our ongoing and future research in
Copyright is held by the author/owner(s). Workshop on the Practical Use of
Recommender Systems, Algorithms and Technologies (PRSAT 2010), held                  this area.
in conjunction with RecSys 2010. September 30, 2010, Barcelona, Spain..
2.    WORK CONTEXT
   The work presented in this paper fits in a collaborative
project that aims to design, develop and implement a set of
innovative services focused on persons in situations of insta-
bility or emerging from instability, in order to help them to
find useful information such as jobs, offers of housing, wel-
fare, or medical assistance. The charity association partner
in the project observed that a large majority of people in
unstable situation own a mobile phone that is considered
as a link with family, friends or society. The project aims
to facilitate, for the vulnerable people, processes to access
charities services using their mobile phone from anywhere.
However, different services are not suitable for all persons in
a precarious situation. For example, a single mother needs
                                                                  Figure 1: Global architecture of SoliMobile Project
child services such as pediatrics or nursery while an unem-
ployed needs services to find a job or professional training.
   Our role in this project is to enhance and customize ser-
                                                                         is poor (or nonexistent) or in case of creating new ser-
vices to users. In this context, we deal with the implementa-
                                                                         vices (items) that no one (in our data set) has yet rated
tion of algorithms to adapt to user profile the platform ser-
                                                                         or visited. This problem is well known in the field of
vices, to filter them and to show only items that may be of
                                                                         information filtering and is referred as ”Cold Start”
interest. Personnalization according to user profile is based
                                                                         problem. Almost solutions for the cold-start problem
both on data available on the platform (eg. databases), the
                                                                         [Lam et al. 2008] are not suitable as they involve users
features and traces of user navigation, and also the social
                                                                         to rate items.
environment of the user (collaborative filtering approach).
   Typically, adaptation to the user profile will consolidate          • Develop a generic recommender system, i.e. that adapts
the resources (services) to target only the relevant users.              to any application. The challenge is to design a real-
Conversely, user profiles will also be ordered to form homo-             time recommender system that filters resources dy-
geneous groups in order to assign them to a given resource.              namically depending on variation in user interests but
                                                                         also on variation in the environment. The idea is to
3.    GLOBAL ARCHITECTURE                                                associate with each resource a ranking based on the
   We present in this section the overall architecture of the            user profile and its context. We use for that incremen-
application, illustrated in Figure 1, in order to show the role          tal learning techniques [3] and mining data streams [4]
of the recommendation in the SoliMobile platform. In fact,               that requires a limited number of passes on data and
end users create an account via the Web platform where                   needs to process data on the fly. Using these methods
many services are provided. The traces of Web browsing                   improves computation time and memory space so we
(also called logs) are collected from servers to feed the rec-           can ensure robustness and scalability of the system;
ommender system. These navigation traces will be used to
create the user item ratings matrix. Services play the role            • Define satisfactory indicators in order to assess the
of items. Information regarding the user profile such as age,            quality of the recommendation;
address, occupationo or preferences as well as information             • Conduct a software platform integrating all the tools
concerning the description of services such as the category              developed during the project.
of services (health, employment, child care, etc..) and their
addresses will be provided as input of recommender system.               Given the short duration of the project (18 months),
These inputs will be sent in XML format through Web ser-                 we decided to study open source recommender sys-
vices. Once the ratings matrix constructed, the recommen-                tems. Thus, we present in the following section the
dation is made to categorize and customize the layout of                 related state of the art.
proposed services on the mobile phone. The recommenda-
tion system will provide as output an XML file that contains      4.    OPEN-SOURCE RECOMMENDER SYS-
a subset of sorted services to be transmitted to the mobile.
Traces of mobile browsing will also be used as input to the
                                                                        TEMS
recommender system to improve results, they can also serve           The growth of Web content and the expansion of e-commerce
as a feedback to our system.                                      has deeply increased the interest on recommender systems.
   Our goal is structured along the following lines:              This fact has led to the development of some open source
                                                                  projects in the area. Among the recommender systems algo-
     • Construct a generic model for the user profile and also    rithms available in the Web, we can distinguish the follow-
       for structural and semantic information of the appli-      ing: Duine [5], Apache Mahout [9], OpenSlopeOne [11], Cofi
       cation in order to integrate new data when needed;         [2], SUGGEST [13] and Vogoo [14]. All of these projects of-
     • Select, automatically and dynamically, variables de-       fer collaborative-filtering implementations, in different pro-
       scribing the user, the services and the log navigation     gramming languages.
       that improve the quality of the recommendation;               The Duine Framework supplies also a hybrid implemen-
                                                                  tation. It is a Java software that presents the content-based
     • Ensure the proper functioning of the recommender sys-      and collaborative filtering in a switching engine: it dynam-
       tem in case of registration of a new user whose profile    ically switches between each prediction given the current
state of the data. For example if there aren’t many ratings         With these interfaces, it is possible to adapt the frame-
available, it uses the content-based approach, and switches      work to read different types of data, personalize the recom-
to the collaborative when the scenario changes. It also          mendation or even create new recommendation methods.
presents an Explanation API, which can be used to cre-              The User Similarity and Item Similarity abstractions are
ate user-friendly recommendations and a demo application,        responsible for calculating the similarity between a pair of
with a Java Client example.                                      users or items. Their function usually returns a value from
   Mahout constitutes a Java framework in the data mining        0 to 1 indicating the level of resemblance, being 1 the most
area. It has incorporated the Taste recommender system, a        similar possible.
collaborative engine for personalized recommendations. Vo-          Trough the DataModel interface is made the access to the
goo is a PHP framework that implements a collaborative           data set. It is possible to retrieve and store the data from
filtering recommender system. It also presents a Slope-One       databases or from filesystems (MySQLJDBCDataModel and
code.                                                            FileDataModel respectively). The functions developed in
   A Java version of the Collaborative Filtering method is       this interface are used by the Similarity abstraction to help
implemented in the Cofi library. It was developed by Daniel      computing the similarity.
Lemire [6], the creator of the Slope-One algorithms. There is       The main interface in Taste is Recommender. It is respon-
also a PHP version available in Lemire’s webpage. OpenSlope-     sible for actually making the recommendations to the user
One offers a Slope One implementation on PHP that cares          by comparing items or by determining users with similar
about performance.                                               taste (item-based and user-based techniques). The Recom-
   SUGGEST is a recommendation library made by George            mender access the similarity interface and uses its functions
Karkys and distributed in a binary format.                       to compare a pair of users or items. It then collects the
   Analyzing software in the recommendation area is not a        highest similarity values to offer as recommendations.
simple task, since it is difficult to define measurement stan-      The UserNeighborhood is an assistant interface that helps
dards. In this work, we propose some criteria of evaluation:     to define the neighborhood in the user-based recommen-
types of recommendation implemented by the project, pro-         dation technique. It is know that, for greater data sets,
gramming language, level of documentation and magnitude          the item-based technique provides better results. For that,
of the project.                                                  many companies choose to use this approach, such as Ama-
   The documentation was evaluated based on its volume           zon [7]. With the Mahout framework, it is not different, the
and clarity. It is possible to observe that the volume of doc-   item-based method generally runs faster and provides more
umentation presented by Mahout and Duine is remarkably           accurate recommendation.
larger than the other systems. Both offer installation and          In our project, we choose to adapt the Slope One (a type
utilization guides and come with a demonstration example.        of item-based algorithm) approach to our problem. Here
It must be taken into account that OpenSlopeOne and Cofi         follows a simple Java application example of how to initiate
are smaller projects, and thus, their documentation tend to      a recommendation with the Slope One technique:
be smaller. In the Downloads column we have a represen-
tation of the magnitude of the project. It is presented the      1. DataModel model =
number of times the software, in any version, was down-                     new FileDataModel(new File("data.txt"));
loaded from its source. Although Mahout does not present         2. Recommender recommender =
its number, its very populated mailing lists shows that it is               new SlopeOneRecommender(model);
a widely used software.                                          3. Recommender cachingRecommender =
   The two projects that stood out were Apache Mahout and                   new CachingRecommender(recommender);
Duine. We installed and tested them in order to verify which
one was more applicable to our work. Both of them are               The challenge in adapting this approach to our project
based on the Java technology and present a demonstration         was the fact that our input data file was available in the
example with the Movielens data set. The fact that Mahout        XML format, a type not handled by Mahout. It then had
is a greater project and has multiple machine-learning algo-     to incorporate another file in the DataModel interface. We
rithms made it more interesting to our research. Also, its       create a program that deals with the XML input files. To
module structure encouraged us to choose it.                     test this new data handler, we used the Movielens data set.
                                                                 A pack with one million ratings was converted to the XML
                                                                 type to be used as example. With this data set and the
5.   APACHE MAHOUT                                               XMLfile, the running time of the Slope One algorithm takes
   The Apache Mahout is a solid project in the Data Min-         less than one minute.
ing area. It is a framework that features various scalable
machine-learning algorithms. It is programmed using the
Java language and runs with Maven project manager. In
                                                                 6.   WEB USAGE MINING FOR RECOMMEN-
April 2008, it has incorporated the Taste Recommender Sys-            DATION
tem, a Java framework for providing personalized recom-             One objective of the SoliMobile project is to develop a rec-
mendations. Besides Taste, it also offers clustering algo-       ommender system that has to be, as much as possible, the
rithms and a Map Reduce implementation.                          least intrusive. This implies that the system is based only
   Taste is a very consistent and flexible collaborative fil-    on information that the user can be free to provide (explicit
tering engine and supports the user-based, item-based and        data) and must run properly with alternatives such as im-
Slope-one recommender systems. It can be easily modified         plicit data mining. To meet this need, we are studying how
due to its well-structured modules abstractions. The pack-       to append Web browsing analysis to the recommender sys-
age defines four interfaces: DataModel, UserSimilarity and       tem as done in [8]. Web browsing analysis becomes almost
ItemSimilarity, UserNeighborhood and Recommender.                necessary for extracting and understanding user behaviors.
                                         Implementation                Language   Documentation      Downloads
             Mahout             Item-based, User-based, Slope One        Java         High          Not available
             Duine                 User-based, Content Filtering         Java         High             1,113
             Cofi                           Item-based                   Java         Low           Not available
             OpenSlopeOne                   Slope One                    PHP          Low               653
             Vogoo                    Slope One, Item-based              PHP         Medium            2,128
             SUGGEST                  Item-based, user-based              C          Medium         Not available

                                        Table 1: Utilisation ratio of each method.



In recent years, Web usage mining has become an important         knowledge, few studies have been devoted to the issue of
issue in the field of data mining. The term, Web usage min-       temporary evolutive data.
ing focuses on predicting and learning the users preferences
on the Internet. Generally, the data for Web usage mining         8.    REFERENCES
are the user interactions on the web, usually residing on Web
                                                                   [1] John S. Breese, John S. Breese, David Heckerman,
clients, Web servers, and proxy servers. The aim of Web us-
                                                                       and Carl” Kadie. Empirical analysis of predictive
age mining is to analyze user behavior through analysis of
                                                                       algorithms for collaborative filtering. pages 43–52,
its interaction with the Web platform. This analysis is par-
                                                                       1998.
ticularly focused on all the users clicks where visiting the
web application (also known as clickstream analysis). The          [2] Cofi. http://www.nongnu.org/cofi/.
interest of Web usage mining in our framework is to enrich         [3] Antoine Cornuéjols. Getting order independence in
the input of recommender system with user data extracted               incremental learning. In ECML ’93: Proceedings of the
from the raw clickstream data, in order to refine the user             European Conference on Machine Learning, pages
profiles and behavioral patterns. The analysis of Web logs             196–212, London, UK, 1993. Springer-Verlag.
can also be used as implicit feedback of the user which will       [4] Baptiste Csernel, Fabrice Clerot, and Georges Hébrail.
allow to assess the performance of models involved in the              Streamsamp: Datastream clustering over tilted
recommender system.                                                    windows through sampling. ECML PKDD 2006: the
   It is obvious that Web logs change over time for several            International Workshop on Knowledge Discovery from
reasons: an update of the Web application content or struc-            Data Streams (IWKDDS-2006), 2006.
ture, a change in the user preferences, a change in the execu-     [5] Duine. http://www.duineframework.org/.
tion context, etc. This is why it is important to take into ac-    [6] Daniel Lemire and Anna Maclachlan ”. Slope one
count the temporal dimension in the analysis of Web usage.             predictors for online rating-based collaborative
To consider the temporal data in a dynamic way, we plan                filtering. 2005.
to use the techniques of data streams mining. By definition,       [7] Greg Linden, Brent Smith, and Jeremy York.
data stream is a real-time, continuous, ordered (implicitly by         Amazon.com recommendations: Item-to-item
arrival time or explicitly by timestamp) sequence of items.            collaborative filtering. IEEE Internet Computing,
It is impossible to control the order in which items arrive,           7(1):76–80, 2003.
nor is it feasible to locally store a stream in its entirety.      [8] Jiahui Liu, Peter Dolan, and Elin Rønby Pedersen.
Therefore, all the treatment have to be applied in one pass.           Personalized news recommendation based on click
Several techniques for mining data streams have emerged as             behavior. In IUI ’10: Proceeding of the 14th
CluStream for clustering, StreamSamp for sampling, VFDT                international conference on Intelligent user interfaces,
for incremental decision trees, etc. The reader may refer to           pages 31–40, New York, NY, USA, 2010. ACM.
[4] for more explanations on these different techniques.           [9] Mahout. http://mahout.apache.org/.
                                                                  [10] Raymond J. Mooney and Loriene Roy. Content-based
7.   CONCLUSIONS                                                       book recommending using learning for text
  In this paper, we presented the problem that we deal with            categorization. In DL ’00: Proceedings of the fifth
in the SoliMobile project. Then, we presented the global               ACM conference on Digital libraries, pages 195–204,
architecture that is under development in this project. This           New York, NY, USA, 2000. ACM.
architecture includes a recommender system to customize           [11] OpenSlopeOne.
the services offered to users based on their profile and their         http://code.google.com/p/openslopeone/.
browsing history. Given the limited duration of the project,      [12] ProximaMobile. http://www.proximamobile.fr/.
we opted for an open source recommender system that is            [13] Suggest.
modular in order to easily integrate future developments,              http://glaros.dtc.umn.edu/gkhome/suggest/overview.
in particular the use of Web usage mining to address the          [14] Vogoo. http://www.vogoo-api.com/.
problem of cold start.
  In this paper, we also discussed several points concerning
the issue of treatment of the temporal dimension in data
analysis. The raised issues demonstrate the need for defin-
ing new methods or adapting existing methods for extract-
ing knowledge and monitoring changing and evolutive data.
Although there are many efficient methods for extracting