Open Source Recommendation Systems for Mobile Application Renata Ghisloti De Raja Chiky Zakia Kazi Aoul Souza LISITE-ISEP LISITE-ISEP LISITE-ISEP 28 rue Notre Dame Des 28 rue Notre Dame Des 28 rue Notre Dame Des Champs Champs Champs 75006 Paris 75006 Paris 75006 Paris raja.chiky@isep.fr zakia.kazi@isep.fr renata.ghisloti@isep.fr ABSTRACT area. These systems intend to help users by providing use- The aim of Recommender Systems is to suggest useful items ful suggestions to them. They may suggest items in differ- to users. Three major techniques can be highlighted in these ent manners, such as comparing the user taste with other systems: Collaborative Filtering, Content-Based Filtering users tastes or comparing the users preferences with other and Hybrid Filtering. The collaborative method proposes items definitions. These two methods are the so called col- recommendations based on what a group of users have en- laborative filtering [1] and content-based filtering [10]. The joyed and it is widely used in Open Source Recommender collaborative method presents advantages over the content- Systems. The work presented in this paper takes place in based one. It is more efficient in practice and simpler to the context of SoliMobile Project that aims to design, build implement. Due to this fact, the majority of open source and implement a package of innovative services focused on projects choose it. Current open source recommender sys- the individual in unstable situation (unemployment, home- tem projects are usually built on the item-based approach, less, etc.). In this paper, we present a study of open source a type of collaborative filtering. Their features vary on the recommender systems and their usefulness for SoliMobile. programming language, extent of documentation and mag- The paper also presents how our recommender system is fed nitude of the project. by extracting implicit ratings using the techniques of Web We give in this paper an overview on known recommen- Usage Mining. dation techniques and we analyze open source projects in this field of research. Our interest of recommender systems is justified by the fact that we have to choose one of the Categories and Subject Descriptors studied systems and to integrate it in a complex platform H.3.4 [Systems and Software]: Performance evaluation that includes a Web platform, a personalization system and (efficiency and effectiveness); H.2.8 [Database applica- a mobile interface. This platform is developed through the tions]: Data mining—Web usage mining SoliMobile project, funded by ProximaMobile [12]. The SoliMobile project in which we are involved, aims at providing a portal services helping and assisting people who General Terms are in different unstable situations. This project provides Algorithms, Experimentation, Theory end users with information to facilitate the process to access to charities services from anywhere. The portal has to offer services adapted to each user profile, taking into account Keywords their preferences and navigation traces. Our work aims to Open source recommender systems, collaborative filtering, provide the user with a recommendation of items (services) Mahout, Web usage mining based on the profile. The recommendation’s main function is to aggregate content from different sources and mobile Web portal and to customize the presentation of services 1. INTRODUCTION for each user according to his profile. It allows classification The amount of information in the web has greatly in- or restriction of services into a selection that fits the user creased in the past decade. This phenomenon has pro- profile. moted the advance of the recommender systems research The remainder of this paper is organized as follows. Sec- tion 2 describes the context of the work presented in this paper. In Section 3 , we present the global architecture Permission to make digital or hard copies of all or part of this work for of the SoliMobile Project. Section 4 details the analysis of personal or classroom use is granted without fee provided that copies are existing Open Source Recommender Systems. The recom- not made or distributed for profit or commercial advantage and that copies mender system used in the project is explained in section 5. bear this notice and the full citation on the first page. To copy otherwise, to Section 6 presents the utility of web usage mining in the rec- republish, to post on servers or to redistribute to lists, requires prior specific ommendation. Finally, Section 7 concludes this paper and permission and/or a fee. gives an outlook upon our ongoing and future research in Copyright is held by the author/owner(s). Workshop on the Practical Use of Recommender Systems, Algorithms and Technologies (PRSAT 2010), held this area. in conjunction with RecSys 2010. September 30, 2010, Barcelona, Spain.. 2. WORK CONTEXT The work presented in this paper fits in a collaborative project that aims to design, develop and implement a set of innovative services focused on persons in situations of insta- bility or emerging from instability, in order to help them to find useful information such as jobs, offers of housing, wel- fare, or medical assistance. The charity association partner in the project observed that a large majority of people in unstable situation own a mobile phone that is considered as a link with family, friends or society. The project aims to facilitate, for the vulnerable people, processes to access charities services using their mobile phone from anywhere. However, different services are not suitable for all persons in a precarious situation. For example, a single mother needs Figure 1: Global architecture of SoliMobile Project child services such as pediatrics or nursery while an unem- ployed needs services to find a job or professional training. Our role in this project is to enhance and customize ser- is poor (or nonexistent) or in case of creating new ser- vices to users. In this context, we deal with the implementa- vices (items) that no one (in our data set) has yet rated tion of algorithms to adapt to user profile the platform ser- or visited. This problem is well known in the field of vices, to filter them and to show only items that may be of information filtering and is referred as ”Cold Start” interest. Personnalization according to user profile is based problem. Almost solutions for the cold-start problem both on data available on the platform (eg. databases), the [Lam et al. 2008] are not suitable as they involve users features and traces of user navigation, and also the social to rate items. environment of the user (collaborative filtering approach). Typically, adaptation to the user profile will consolidate • Develop a generic recommender system, i.e. that adapts the resources (services) to target only the relevant users. to any application. The challenge is to design a real- Conversely, user profiles will also be ordered to form homo- time recommender system that filters resources dy- geneous groups in order to assign them to a given resource. namically depending on variation in user interests but also on variation in the environment. The idea is to 3. GLOBAL ARCHITECTURE associate with each resource a ranking based on the We present in this section the overall architecture of the user profile and its context. We use for that incremen- application, illustrated in Figure 1, in order to show the role tal learning techniques [3] and mining data streams [4] of the recommendation in the SoliMobile platform. In fact, that requires a limited number of passes on data and end users create an account via the Web platform where needs to process data on the fly. Using these methods many services are provided. The traces of Web browsing improves computation time and memory space so we (also called logs) are collected from servers to feed the rec- can ensure robustness and scalability of the system; ommender system. These navigation traces will be used to create the user item ratings matrix. Services play the role • Define satisfactory indicators in order to assess the of items. Information regarding the user profile such as age, quality of the recommendation; address, occupationo or preferences as well as information • Conduct a software platform integrating all the tools concerning the description of services such as the category developed during the project. of services (health, employment, child care, etc..) and their addresses will be provided as input of recommender system. Given the short duration of the project (18 months), These inputs will be sent in XML format through Web ser- we decided to study open source recommender sys- vices. Once the ratings matrix constructed, the recommen- tems. Thus, we present in the following section the dation is made to categorize and customize the layout of related state of the art. proposed services on the mobile phone. The recommenda- tion system will provide as output an XML file that contains 4. OPEN-SOURCE RECOMMENDER SYS- a subset of sorted services to be transmitted to the mobile. Traces of mobile browsing will also be used as input to the TEMS recommender system to improve results, they can also serve The growth of Web content and the expansion of e-commerce as a feedback to our system. has deeply increased the interest on recommender systems. Our goal is structured along the following lines: This fact has led to the development of some open source projects in the area. Among the recommender systems algo- • Construct a generic model for the user profile and also rithms available in the Web, we can distinguish the follow- for structural and semantic information of the appli- ing: Duine [5], Apache Mahout [9], OpenSlopeOne [11], Cofi cation in order to integrate new data when needed; [2], SUGGEST [13] and Vogoo [14]. All of these projects of- • Select, automatically and dynamically, variables de- fer collaborative-filtering implementations, in different pro- scribing the user, the services and the log navigation gramming languages. that improve the quality of the recommendation; The Duine Framework supplies also a hybrid implemen- tation. It is a Java software that presents the content-based • Ensure the proper functioning of the recommender sys- and collaborative filtering in a switching engine: it dynam- tem in case of registration of a new user whose profile ically switches between each prediction given the current state of the data. For example if there aren’t many ratings With these interfaces, it is possible to adapt the frame- available, it uses the content-based approach, and switches work to read different types of data, personalize the recom- to the collaborative when the scenario changes. It also mendation or even create new recommendation methods. presents an Explanation API, which can be used to cre- The User Similarity and Item Similarity abstractions are ate user-friendly recommendations and a demo application, responsible for calculating the similarity between a pair of with a Java Client example. users or items. Their function usually returns a value from Mahout constitutes a Java framework in the data mining 0 to 1 indicating the level of resemblance, being 1 the most area. It has incorporated the Taste recommender system, a similar possible. collaborative engine for personalized recommendations. Vo- Trough the DataModel interface is made the access to the goo is a PHP framework that implements a collaborative data set. It is possible to retrieve and store the data from filtering recommender system. It also presents a Slope-One databases or from filesystems (MySQLJDBCDataModel and code. FileDataModel respectively). The functions developed in A Java version of the Collaborative Filtering method is this interface are used by the Similarity abstraction to help implemented in the Cofi library. It was developed by Daniel computing the similarity. Lemire [6], the creator of the Slope-One algorithms. There is The main interface in Taste is Recommender. It is respon- also a PHP version available in Lemire’s webpage. OpenSlope- sible for actually making the recommendations to the user One offers a Slope One implementation on PHP that cares by comparing items or by determining users with similar about performance. taste (item-based and user-based techniques). The Recom- SUGGEST is a recommendation library made by George mender access the similarity interface and uses its functions Karkys and distributed in a binary format. to compare a pair of users or items. It then collects the Analyzing software in the recommendation area is not a highest similarity values to offer as recommendations. simple task, since it is difficult to define measurement stan- The UserNeighborhood is an assistant interface that helps dards. In this work, we propose some criteria of evaluation: to define the neighborhood in the user-based recommen- types of recommendation implemented by the project, pro- dation technique. It is know that, for greater data sets, gramming language, level of documentation and magnitude the item-based technique provides better results. For that, of the project. many companies choose to use this approach, such as Ama- The documentation was evaluated based on its volume zon [7]. With the Mahout framework, it is not different, the and clarity. It is possible to observe that the volume of doc- item-based method generally runs faster and provides more umentation presented by Mahout and Duine is remarkably accurate recommendation. larger than the other systems. Both offer installation and In our project, we choose to adapt the Slope One (a type utilization guides and come with a demonstration example. of item-based algorithm) approach to our problem. Here It must be taken into account that OpenSlopeOne and Cofi follows a simple Java application example of how to initiate are smaller projects, and thus, their documentation tend to a recommendation with the Slope One technique: be smaller. In the Downloads column we have a represen- tation of the magnitude of the project. It is presented the 1. DataModel model = number of times the software, in any version, was down- new FileDataModel(new File("data.txt")); loaded from its source. Although Mahout does not present 2. Recommender recommender = its number, its very populated mailing lists shows that it is new SlopeOneRecommender(model); a widely used software. 3. Recommender cachingRecommender = The two projects that stood out were Apache Mahout and new CachingRecommender(recommender); Duine. We installed and tested them in order to verify which one was more applicable to our work. Both of them are The challenge in adapting this approach to our project based on the Java technology and present a demonstration was the fact that our input data file was available in the example with the Movielens data set. The fact that Mahout XML format, a type not handled by Mahout. It then had is a greater project and has multiple machine-learning algo- to incorporate another file in the DataModel interface. We rithms made it more interesting to our research. Also, its create a program that deals with the XML input files. To module structure encouraged us to choose it. test this new data handler, we used the Movielens data set. A pack with one million ratings was converted to the XML type to be used as example. With this data set and the 5. APACHE MAHOUT XMLfile, the running time of the Slope One algorithm takes The Apache Mahout is a solid project in the Data Min- less than one minute. ing area. It is a framework that features various scalable machine-learning algorithms. It is programmed using the Java language and runs with Maven project manager. In 6. WEB USAGE MINING FOR RECOMMEN- April 2008, it has incorporated the Taste Recommender Sys- DATION tem, a Java framework for providing personalized recom- One objective of the SoliMobile project is to develop a rec- mendations. Besides Taste, it also offers clustering algo- ommender system that has to be, as much as possible, the rithms and a Map Reduce implementation. least intrusive. This implies that the system is based only Taste is a very consistent and flexible collaborative fil- on information that the user can be free to provide (explicit tering engine and supports the user-based, item-based and data) and must run properly with alternatives such as im- Slope-one recommender systems. It can be easily modified plicit data mining. To meet this need, we are studying how due to its well-structured modules abstractions. The pack- to append Web browsing analysis to the recommender sys- age defines four interfaces: DataModel, UserSimilarity and tem as done in [8]. Web browsing analysis becomes almost ItemSimilarity, UserNeighborhood and Recommender. necessary for extracting and understanding user behaviors. Implementation Language Documentation Downloads Mahout Item-based, User-based, Slope One Java High Not available Duine User-based, Content Filtering Java High 1,113 Cofi Item-based Java Low Not available OpenSlopeOne Slope One PHP Low 653 Vogoo Slope One, Item-based PHP Medium 2,128 SUGGEST Item-based, user-based C Medium Not available Table 1: Utilisation ratio of each method. In recent years, Web usage mining has become an important knowledge, few studies have been devoted to the issue of issue in the field of data mining. The term, Web usage min- temporary evolutive data. ing focuses on predicting and learning the users preferences on the Internet. Generally, the data for Web usage mining 8. REFERENCES are the user interactions on the web, usually residing on Web [1] John S. Breese, John S. Breese, David Heckerman, clients, Web servers, and proxy servers. The aim of Web us- and Carl” Kadie. Empirical analysis of predictive age mining is to analyze user behavior through analysis of algorithms for collaborative filtering. pages 43–52, its interaction with the Web platform. This analysis is par- 1998. ticularly focused on all the users clicks where visiting the web application (also known as clickstream analysis). The [2] Cofi. http://www.nongnu.org/cofi/. interest of Web usage mining in our framework is to enrich [3] Antoine Cornuéjols. Getting order independence in the input of recommender system with user data extracted incremental learning. In ECML ’93: Proceedings of the from the raw clickstream data, in order to refine the user European Conference on Machine Learning, pages profiles and behavioral patterns. The analysis of Web logs 196–212, London, UK, 1993. Springer-Verlag. can also be used as implicit feedback of the user which will [4] Baptiste Csernel, Fabrice Clerot, and Georges Hébrail. allow to assess the performance of models involved in the Streamsamp: Datastream clustering over tilted recommender system. windows through sampling. ECML PKDD 2006: the It is obvious that Web logs change over time for several International Workshop on Knowledge Discovery from reasons: an update of the Web application content or struc- Data Streams (IWKDDS-2006), 2006. ture, a change in the user preferences, a change in the execu- [5] Duine. http://www.duineframework.org/. tion context, etc. This is why it is important to take into ac- [6] Daniel Lemire and Anna Maclachlan ”. Slope one count the temporal dimension in the analysis of Web usage. predictors for online rating-based collaborative To consider the temporal data in a dynamic way, we plan filtering. 2005. to use the techniques of data streams mining. By definition, [7] Greg Linden, Brent Smith, and Jeremy York. data stream is a real-time, continuous, ordered (implicitly by Amazon.com recommendations: Item-to-item arrival time or explicitly by timestamp) sequence of items. collaborative filtering. IEEE Internet Computing, It is impossible to control the order in which items arrive, 7(1):76–80, 2003. nor is it feasible to locally store a stream in its entirety. [8] Jiahui Liu, Peter Dolan, and Elin Rønby Pedersen. Therefore, all the treatment have to be applied in one pass. Personalized news recommendation based on click Several techniques for mining data streams have emerged as behavior. In IUI ’10: Proceeding of the 14th CluStream for clustering, StreamSamp for sampling, VFDT international conference on Intelligent user interfaces, for incremental decision trees, etc. The reader may refer to pages 31–40, New York, NY, USA, 2010. ACM. [4] for more explanations on these different techniques. [9] Mahout. http://mahout.apache.org/. [10] Raymond J. Mooney and Loriene Roy. Content-based 7. CONCLUSIONS book recommending using learning for text In this paper, we presented the problem that we deal with categorization. In DL ’00: Proceedings of the fifth in the SoliMobile project. Then, we presented the global ACM conference on Digital libraries, pages 195–204, architecture that is under development in this project. This New York, NY, USA, 2000. ACM. architecture includes a recommender system to customize [11] OpenSlopeOne. the services offered to users based on their profile and their http://code.google.com/p/openslopeone/. browsing history. Given the limited duration of the project, [12] ProximaMobile. http://www.proximamobile.fr/. we opted for an open source recommender system that is [13] Suggest. modular in order to easily integrate future developments, http://glaros.dtc.umn.edu/gkhome/suggest/overview. in particular the use of Web usage mining to address the [14] Vogoo. http://www.vogoo-api.com/. problem of cold start. In this paper, we also discussed several points concerning the issue of treatment of the temporal dimension in data analysis. The raised issues demonstrate the need for defin- ing new methods or adapting existing methods for extract- ing knowledge and monitoring changing and evolutive data. Although there are many efficient methods for extracting