=Paper=
{{Paper
|id=None
|storemode=property
|title=Open Source Recommendation Systems for Mobile Application
|pdfUrl=https://ceur-ws.org/Vol-676/paper9.pdf
|volume=Vol-676
|dblpUrl=https://dblp.org/rec/conf/recsys/SouzaCK10
}}
==Open Source Recommendation Systems for Mobile Application==
Open Source Recommendation Systems for Mobile
Application
Renata Ghisloti De Raja Chiky Zakia Kazi Aoul
Souza LISITE-ISEP LISITE-ISEP
LISITE-ISEP 28 rue Notre Dame Des 28 rue Notre Dame Des
28 rue Notre Dame Des Champs Champs
Champs 75006 Paris 75006 Paris
75006 Paris raja.chiky@isep.fr zakia.kazi@isep.fr
renata.ghisloti@isep.fr
ABSTRACT area. These systems intend to help users by providing use-
The aim of Recommender Systems is to suggest useful items ful suggestions to them. They may suggest items in differ-
to users. Three major techniques can be highlighted in these ent manners, such as comparing the user taste with other
systems: Collaborative Filtering, Content-Based Filtering users tastes or comparing the users preferences with other
and Hybrid Filtering. The collaborative method proposes items definitions. These two methods are the so called col-
recommendations based on what a group of users have en- laborative filtering [1] and content-based filtering [10]. The
joyed and it is widely used in Open Source Recommender collaborative method presents advantages over the content-
Systems. The work presented in this paper takes place in based one. It is more efficient in practice and simpler to
the context of SoliMobile Project that aims to design, build implement. Due to this fact, the majority of open source
and implement a package of innovative services focused on projects choose it. Current open source recommender sys-
the individual in unstable situation (unemployment, home- tem projects are usually built on the item-based approach,
less, etc.). In this paper, we present a study of open source a type of collaborative filtering. Their features vary on the
recommender systems and their usefulness for SoliMobile. programming language, extent of documentation and mag-
The paper also presents how our recommender system is fed nitude of the project.
by extracting implicit ratings using the techniques of Web We give in this paper an overview on known recommen-
Usage Mining. dation techniques and we analyze open source projects in
this field of research. Our interest of recommender systems
is justified by the fact that we have to choose one of the
Categories and Subject Descriptors studied systems and to integrate it in a complex platform
H.3.4 [Systems and Software]: Performance evaluation that includes a Web platform, a personalization system and
(efficiency and effectiveness); H.2.8 [Database applica- a mobile interface. This platform is developed through the
tions]: Data mining—Web usage mining SoliMobile project, funded by ProximaMobile [12].
The SoliMobile project in which we are involved, aims at
providing a portal services helping and assisting people who
General Terms are in different unstable situations. This project provides
Algorithms, Experimentation, Theory end users with information to facilitate the process to access
to charities services from anywhere. The portal has to offer
services adapted to each user profile, taking into account
Keywords their preferences and navigation traces. Our work aims to
Open source recommender systems, collaborative filtering, provide the user with a recommendation of items (services)
Mahout, Web usage mining based on the profile. The recommendation’s main function
is to aggregate content from different sources and mobile
Web portal and to customize the presentation of services
1. INTRODUCTION for each user according to his profile. It allows classification
The amount of information in the web has greatly in- or restriction of services into a selection that fits the user
creased in the past decade. This phenomenon has pro- profile.
moted the advance of the recommender systems research The remainder of this paper is organized as follows. Sec-
tion 2 describes the context of the work presented in this
paper. In Section 3 , we present the global architecture
Permission to make digital or hard copies of all or part of this work for of the SoliMobile Project. Section 4 details the analysis of
personal or classroom use is granted without fee provided that copies are existing Open Source Recommender Systems. The recom-
not made or distributed for profit or commercial advantage and that copies mender system used in the project is explained in section 5.
bear this notice and the full citation on the first page. To copy otherwise, to Section 6 presents the utility of web usage mining in the rec-
republish, to post on servers or to redistribute to lists, requires prior specific ommendation. Finally, Section 7 concludes this paper and
permission and/or a fee. gives an outlook upon our ongoing and future research in
Copyright is held by the author/owner(s). Workshop on the Practical Use of
Recommender Systems, Algorithms and Technologies (PRSAT 2010), held this area.
in conjunction with RecSys 2010. September 30, 2010, Barcelona, Spain..
2. WORK CONTEXT
The work presented in this paper fits in a collaborative
project that aims to design, develop and implement a set of
innovative services focused on persons in situations of insta-
bility or emerging from instability, in order to help them to
find useful information such as jobs, offers of housing, wel-
fare, or medical assistance. The charity association partner
in the project observed that a large majority of people in
unstable situation own a mobile phone that is considered
as a link with family, friends or society. The project aims
to facilitate, for the vulnerable people, processes to access
charities services using their mobile phone from anywhere.
However, different services are not suitable for all persons in
a precarious situation. For example, a single mother needs
Figure 1: Global architecture of SoliMobile Project
child services such as pediatrics or nursery while an unem-
ployed needs services to find a job or professional training.
Our role in this project is to enhance and customize ser-
is poor (or nonexistent) or in case of creating new ser-
vices to users. In this context, we deal with the implementa-
vices (items) that no one (in our data set) has yet rated
tion of algorithms to adapt to user profile the platform ser-
or visited. This problem is well known in the field of
vices, to filter them and to show only items that may be of
information filtering and is referred as ”Cold Start”
interest. Personnalization according to user profile is based
problem. Almost solutions for the cold-start problem
both on data available on the platform (eg. databases), the
[Lam et al. 2008] are not suitable as they involve users
features and traces of user navigation, and also the social
to rate items.
environment of the user (collaborative filtering approach).
Typically, adaptation to the user profile will consolidate • Develop a generic recommender system, i.e. that adapts
the resources (services) to target only the relevant users. to any application. The challenge is to design a real-
Conversely, user profiles will also be ordered to form homo- time recommender system that filters resources dy-
geneous groups in order to assign them to a given resource. namically depending on variation in user interests but
also on variation in the environment. The idea is to
3. GLOBAL ARCHITECTURE associate with each resource a ranking based on the
We present in this section the overall architecture of the user profile and its context. We use for that incremen-
application, illustrated in Figure 1, in order to show the role tal learning techniques [3] and mining data streams [4]
of the recommendation in the SoliMobile platform. In fact, that requires a limited number of passes on data and
end users create an account via the Web platform where needs to process data on the fly. Using these methods
many services are provided. The traces of Web browsing improves computation time and memory space so we
(also called logs) are collected from servers to feed the rec- can ensure robustness and scalability of the system;
ommender system. These navigation traces will be used to
create the user item ratings matrix. Services play the role • Define satisfactory indicators in order to assess the
of items. Information regarding the user profile such as age, quality of the recommendation;
address, occupationo or preferences as well as information • Conduct a software platform integrating all the tools
concerning the description of services such as the category developed during the project.
of services (health, employment, child care, etc..) and their
addresses will be provided as input of recommender system. Given the short duration of the project (18 months),
These inputs will be sent in XML format through Web ser- we decided to study open source recommender sys-
vices. Once the ratings matrix constructed, the recommen- tems. Thus, we present in the following section the
dation is made to categorize and customize the layout of related state of the art.
proposed services on the mobile phone. The recommenda-
tion system will provide as output an XML file that contains 4. OPEN-SOURCE RECOMMENDER SYS-
a subset of sorted services to be transmitted to the mobile.
Traces of mobile browsing will also be used as input to the
TEMS
recommender system to improve results, they can also serve The growth of Web content and the expansion of e-commerce
as a feedback to our system. has deeply increased the interest on recommender systems.
Our goal is structured along the following lines: This fact has led to the development of some open source
projects in the area. Among the recommender systems algo-
• Construct a generic model for the user profile and also rithms available in the Web, we can distinguish the follow-
for structural and semantic information of the appli- ing: Duine [5], Apache Mahout [9], OpenSlopeOne [11], Cofi
cation in order to integrate new data when needed; [2], SUGGEST [13] and Vogoo [14]. All of these projects of-
• Select, automatically and dynamically, variables de- fer collaborative-filtering implementations, in different pro-
scribing the user, the services and the log navigation gramming languages.
that improve the quality of the recommendation; The Duine Framework supplies also a hybrid implemen-
tation. It is a Java software that presents the content-based
• Ensure the proper functioning of the recommender sys- and collaborative filtering in a switching engine: it dynam-
tem in case of registration of a new user whose profile ically switches between each prediction given the current
state of the data. For example if there aren’t many ratings With these interfaces, it is possible to adapt the frame-
available, it uses the content-based approach, and switches work to read different types of data, personalize the recom-
to the collaborative when the scenario changes. It also mendation or even create new recommendation methods.
presents an Explanation API, which can be used to cre- The User Similarity and Item Similarity abstractions are
ate user-friendly recommendations and a demo application, responsible for calculating the similarity between a pair of
with a Java Client example. users or items. Their function usually returns a value from
Mahout constitutes a Java framework in the data mining 0 to 1 indicating the level of resemblance, being 1 the most
area. It has incorporated the Taste recommender system, a similar possible.
collaborative engine for personalized recommendations. Vo- Trough the DataModel interface is made the access to the
goo is a PHP framework that implements a collaborative data set. It is possible to retrieve and store the data from
filtering recommender system. It also presents a Slope-One databases or from filesystems (MySQLJDBCDataModel and
code. FileDataModel respectively). The functions developed in
A Java version of the Collaborative Filtering method is this interface are used by the Similarity abstraction to help
implemented in the Cofi library. It was developed by Daniel computing the similarity.
Lemire [6], the creator of the Slope-One algorithms. There is The main interface in Taste is Recommender. It is respon-
also a PHP version available in Lemire’s webpage. OpenSlope- sible for actually making the recommendations to the user
One offers a Slope One implementation on PHP that cares by comparing items or by determining users with similar
about performance. taste (item-based and user-based techniques). The Recom-
SUGGEST is a recommendation library made by George mender access the similarity interface and uses its functions
Karkys and distributed in a binary format. to compare a pair of users or items. It then collects the
Analyzing software in the recommendation area is not a highest similarity values to offer as recommendations.
simple task, since it is difficult to define measurement stan- The UserNeighborhood is an assistant interface that helps
dards. In this work, we propose some criteria of evaluation: to define the neighborhood in the user-based recommen-
types of recommendation implemented by the project, pro- dation technique. It is know that, for greater data sets,
gramming language, level of documentation and magnitude the item-based technique provides better results. For that,
of the project. many companies choose to use this approach, such as Ama-
The documentation was evaluated based on its volume zon [7]. With the Mahout framework, it is not different, the
and clarity. It is possible to observe that the volume of doc- item-based method generally runs faster and provides more
umentation presented by Mahout and Duine is remarkably accurate recommendation.
larger than the other systems. Both offer installation and In our project, we choose to adapt the Slope One (a type
utilization guides and come with a demonstration example. of item-based algorithm) approach to our problem. Here
It must be taken into account that OpenSlopeOne and Cofi follows a simple Java application example of how to initiate
are smaller projects, and thus, their documentation tend to a recommendation with the Slope One technique:
be smaller. In the Downloads column we have a represen-
tation of the magnitude of the project. It is presented the 1. DataModel model =
number of times the software, in any version, was down- new FileDataModel(new File("data.txt"));
loaded from its source. Although Mahout does not present 2. Recommender recommender =
its number, its very populated mailing lists shows that it is new SlopeOneRecommender(model);
a widely used software. 3. Recommender cachingRecommender =
The two projects that stood out were Apache Mahout and new CachingRecommender(recommender);
Duine. We installed and tested them in order to verify which
one was more applicable to our work. Both of them are The challenge in adapting this approach to our project
based on the Java technology and present a demonstration was the fact that our input data file was available in the
example with the Movielens data set. The fact that Mahout XML format, a type not handled by Mahout. It then had
is a greater project and has multiple machine-learning algo- to incorporate another file in the DataModel interface. We
rithms made it more interesting to our research. Also, its create a program that deals with the XML input files. To
module structure encouraged us to choose it. test this new data handler, we used the Movielens data set.
A pack with one million ratings was converted to the XML
type to be used as example. With this data set and the
5. APACHE MAHOUT XMLfile, the running time of the Slope One algorithm takes
The Apache Mahout is a solid project in the Data Min- less than one minute.
ing area. It is a framework that features various scalable
machine-learning algorithms. It is programmed using the
Java language and runs with Maven project manager. In
6. WEB USAGE MINING FOR RECOMMEN-
April 2008, it has incorporated the Taste Recommender Sys- DATION
tem, a Java framework for providing personalized recom- One objective of the SoliMobile project is to develop a rec-
mendations. Besides Taste, it also offers clustering algo- ommender system that has to be, as much as possible, the
rithms and a Map Reduce implementation. least intrusive. This implies that the system is based only
Taste is a very consistent and flexible collaborative fil- on information that the user can be free to provide (explicit
tering engine and supports the user-based, item-based and data) and must run properly with alternatives such as im-
Slope-one recommender systems. It can be easily modified plicit data mining. To meet this need, we are studying how
due to its well-structured modules abstractions. The pack- to append Web browsing analysis to the recommender sys-
age defines four interfaces: DataModel, UserSimilarity and tem as done in [8]. Web browsing analysis becomes almost
ItemSimilarity, UserNeighborhood and Recommender. necessary for extracting and understanding user behaviors.
Implementation Language Documentation Downloads
Mahout Item-based, User-based, Slope One Java High Not available
Duine User-based, Content Filtering Java High 1,113
Cofi Item-based Java Low Not available
OpenSlopeOne Slope One PHP Low 653
Vogoo Slope One, Item-based PHP Medium 2,128
SUGGEST Item-based, user-based C Medium Not available
Table 1: Utilisation ratio of each method.
In recent years, Web usage mining has become an important knowledge, few studies have been devoted to the issue of
issue in the field of data mining. The term, Web usage min- temporary evolutive data.
ing focuses on predicting and learning the users preferences
on the Internet. Generally, the data for Web usage mining 8. REFERENCES
are the user interactions on the web, usually residing on Web
[1] John S. Breese, John S. Breese, David Heckerman,
clients, Web servers, and proxy servers. The aim of Web us-
and Carl” Kadie. Empirical analysis of predictive
age mining is to analyze user behavior through analysis of
algorithms for collaborative filtering. pages 43–52,
its interaction with the Web platform. This analysis is par-
1998.
ticularly focused on all the users clicks where visiting the
web application (also known as clickstream analysis). The [2] Cofi. http://www.nongnu.org/cofi/.
interest of Web usage mining in our framework is to enrich [3] Antoine Cornuéjols. Getting order independence in
the input of recommender system with user data extracted incremental learning. In ECML ’93: Proceedings of the
from the raw clickstream data, in order to refine the user European Conference on Machine Learning, pages
profiles and behavioral patterns. The analysis of Web logs 196–212, London, UK, 1993. Springer-Verlag.
can also be used as implicit feedback of the user which will [4] Baptiste Csernel, Fabrice Clerot, and Georges Hébrail.
allow to assess the performance of models involved in the Streamsamp: Datastream clustering over tilted
recommender system. windows through sampling. ECML PKDD 2006: the
It is obvious that Web logs change over time for several International Workshop on Knowledge Discovery from
reasons: an update of the Web application content or struc- Data Streams (IWKDDS-2006), 2006.
ture, a change in the user preferences, a change in the execu- [5] Duine. http://www.duineframework.org/.
tion context, etc. This is why it is important to take into ac- [6] Daniel Lemire and Anna Maclachlan ”. Slope one
count the temporal dimension in the analysis of Web usage. predictors for online rating-based collaborative
To consider the temporal data in a dynamic way, we plan filtering. 2005.
to use the techniques of data streams mining. By definition, [7] Greg Linden, Brent Smith, and Jeremy York.
data stream is a real-time, continuous, ordered (implicitly by Amazon.com recommendations: Item-to-item
arrival time or explicitly by timestamp) sequence of items. collaborative filtering. IEEE Internet Computing,
It is impossible to control the order in which items arrive, 7(1):76–80, 2003.
nor is it feasible to locally store a stream in its entirety. [8] Jiahui Liu, Peter Dolan, and Elin Rønby Pedersen.
Therefore, all the treatment have to be applied in one pass. Personalized news recommendation based on click
Several techniques for mining data streams have emerged as behavior. In IUI ’10: Proceeding of the 14th
CluStream for clustering, StreamSamp for sampling, VFDT international conference on Intelligent user interfaces,
for incremental decision trees, etc. The reader may refer to pages 31–40, New York, NY, USA, 2010. ACM.
[4] for more explanations on these different techniques. [9] Mahout. http://mahout.apache.org/.
[10] Raymond J. Mooney and Loriene Roy. Content-based
7. CONCLUSIONS book recommending using learning for text
In this paper, we presented the problem that we deal with categorization. In DL ’00: Proceedings of the fifth
in the SoliMobile project. Then, we presented the global ACM conference on Digital libraries, pages 195–204,
architecture that is under development in this project. This New York, NY, USA, 2000. ACM.
architecture includes a recommender system to customize [11] OpenSlopeOne.
the services offered to users based on their profile and their http://code.google.com/p/openslopeone/.
browsing history. Given the limited duration of the project, [12] ProximaMobile. http://www.proximamobile.fr/.
we opted for an open source recommender system that is [13] Suggest.
modular in order to easily integrate future developments, http://glaros.dtc.umn.edu/gkhome/suggest/overview.
in particular the use of Web usage mining to address the [14] Vogoo. http://www.vogoo-api.com/.
problem of cold start.
In this paper, we also discussed several points concerning
the issue of treatment of the temporal dimension in data
analysis. The raised issues demonstrate the need for defin-
ing new methods or adapting existing methods for extract-
ing knowledge and monitoring changing and evolutive data.
Although there are many efficient methods for extracting