=Paper=
{{Paper
|id=Vol-1348/maics2013_paper_11
|storemode=property
|title=An Initial Investigation of Multi-Cyclic Training Regimen for Collaborative Filtering Models in GraphChi
|pdfUrl=https://ceur-ws.org/Vol-1348/maics2013_paper_11.pdf
|volume=Vol-1348
|dblpUrl=https://dblp.org/rec/conf/maics/CurnaliaL13
}}
==An Initial Investigation of Multi-Cyclic Training Regimen for Collaborative Filtering Models in GraphChi==
An Initial Investigation of Multi-Cyclic Training Regimen for Collaborative Filtering Models in GraphChi James W. Curnalia and Alina Lazar Department of Computer Science and Information Systems Youngstown State University Youngstown, Ohio 44555 jwcurnalia@student.usu.edu alazar@ysu.edu Abstract songs. In addition to the fact that a user has listened to a More and more outlets are utilizing collaborative filtering particular song, there is an explicit rating (1-5, 5 being the techniques to make sense of the sea of data generated by our highest). If a user has not listened to a song this fact is hyper-connected world. How a collaborative filtering model highlighted with an 'X'. A predicted rating needs to be is generated can be the difference between accurate or generated for these unknown songs, so that they can be flawed predictions. This study is to determine the impact of suggested to the appropriate users. a cyclical training regimen on the algorithms presented in The first step in the process is to identify which users the Collaborative Filtering Toolkit for GraphChi. have similar tastes. User A and User C have three points of similarity. They both responded very positively to song6 Introduction to Collaborative Filtering (ratings of A:5 and C:4) and very negatively to song5 (both had ratings of 1). The third point of similarity, song1, is With the growth of on-line services and e-commerce less significant. User C responded very favorably to the the amount of data that users need to make sense of has song, while User A had a neutral reaction. A similar increased exponentially. The difficult task facing these process can generate a taste profile for Users B and D service providers is to sift through a sea of options and find (song1 B:1/D:2, song5 B:4/D:5, and song7 B:4/D:5). Using what a user wants before their competitors. One of the tools these pairings we can begin to identify likely reactions to they have at their disposal is collaborative filtering. some of the users' unknown songs. One of the central assumptions about collaborative filtering is that people with similar tastes, or shopping Table 2: Suggestions Based on Similarities histories, are better predictors of a user's future behavior User song1 song2 song3 song4 song5 song6 song7 than a random person. Collaborative filtering techniques sift through large datasets to identify patterns and A 3 X + 4 1 5 2 similarities between these users, and then make B 1 2 - 2 4 X 4 recommendations. These processes are not limited to the retail sphere, collaborative filtering has also been applied to C 5 X 4 + 1 4 - financial, geological, and other endeavors. An examination of a small example of collaborative filtering, to produce D 2 - 1 - 5 X 5 song recommendations for customers, will help illustrate some of these ideas. Six of the unknown songs have been replaced with a suggested reaction (+ for positive or – for negative). An Table 1: Example User Song History and Ratings exact score would require further analysis, and most likely User song1 song2 song3 song4 song5 song6 song7 a great deal more information, but a more simplistic measure of attitude can be inferred. A 3 X X 4 1 5 2 This still leaves four unknown songs in the table. A B 1 2 X 2 4 X 4 quick look over the table shows that the two pairs of users have completely opposite tastes in music. It would be C 5 X 4 X 1 4 X reasonable to assume that what one pair likes the other will D 2 X 1 X 5 X 5 dislike, and vice versa. In this way we can utilize not only the similarity between users to determine attitudes, but also the reactions of diametrically opposed Users. The example data covers four different customers Now all that remains is to scale the entire process to (A-D) and their listening history over a catalog of seven handle millions of users and products, while maintaining accuracy and minimizing costs (both financial and graph-based algorithms have been adopted by many large, temporal). commercial websites including Amazon and YouTube (Walia 2008). While it is important to note the adoption of Related Work techniques like this by powerful and influential corporations, it is admittedly the latter that was the driving Collaborative Filtering has traditionally included force in adopting a graph-based approach. methods such as Bayesian Networks and clustering. A The GraphLab Project was developed in order to Bayesian Network uses probability and causal relationships facilitate distributed, parallel, graph-based algorithms in an to classify new observations (Pearl 1994). Clustering efficient and reliable manner (Low et al. 2010). GraphChi algorithms attempt to represent observations as “points” in is an offshoot of the GraphLab Project that seeks to a multi-dimensional space. The closer together that two leverage the power graph-based algorithms on a single points are, the more similar the underlying observations machine, while maintaining high performance standards (Witten, Frank and Hall 2011). These and other traditional (Kyrola, Blelloch, and Guestrin 2012). Bickson, one of the methods are, and will continue to be, powerful and valid original developers of GraphLab, has ported a number of collaborative filtering methods. collaborative filtering algorithms from GraphLab to However, in the wake of events like the Netflix Prize a GraphChi in the form of the Collaborative Filtering Toolkit new series of algorithms were developed to deal with a (CFT). Thirteen algorithms were supported at the time this new phenomenon “Big Data”. These new algorithms study was run, December 2012, but two additional sought to incorporate the concepts of the traditional algorithms had already been added as this paper was being methods into new frameworks capable of dealing with data written, January 2013. that was increasingly large, complex, and often extremely In addition to developing the toolkit, Bickson has sparse. written a blog entry which serves as an introduction to the This study will concern itself with thirteen of these CFT and its underlying algorithms (Bickson 2012). In the modern Collaborative Filtering algorithms: tutorial, Bickson identifies a number of algorithms which • Alternating Least Squares (ALS), (Zhou et al. 2008) have an element of fault tolerance. These algorithms allow • Stochastic Gradient Descent (SGD), (Koren 2009) the user to save the model to disk and then resume training • Bias Stochastic Gradient Descent (BSGD), (Koren from that exact state. 2008) Experimentation with the fault-tolerant algorithms • Koren’s Singular Value Decomposition (SVD++), seemed to yield an additional benefit, in that the accuracy (Koren 2008) of the model would often jump between executions of the • Weighted ALS (WALS), (Hu, Koren, and Volinsky training epochs (this paper will use the terms “epoch” and 2008) “cycle” interchangeably to refer to a group of training • Non-Negative Matrix Factorization (NMF), (Lee and iterations). This would seem to suggest that there is an Seung 2001) advantage to using multiple cycles beyond simple fault • Singular Value Decomposition (SVD), (Hernandez, tolerance. The cumulative value of this inter-cycle bump in Roman, and Tomas 2007) accuracy could be quite significant. • One-Sided SVD, (Hernandez, Roman, and Tomas As these algorithms train a model, they attempt to 2007) reduce the sample space in an attempt to converge on an • Tensor ALS (TALS), (Comon, Luciani, and de optimal answer. Given that these algorithms become more Almeida 2009) restrictive and focused the longer that they run, it is • Restricted Bolzman Machines (RBM), (Hinton 2010) reasonable to assume that restarting an algorithm would • Time-SVD++(TSVD++), (Koren 2009) have a significant positive impact on the final model. By • libFM, (Rendle 2010) loosening the bounds placed on the algorithm it is possible • Probabilistic Matrix Factorization (PMF), to identify the possibility that the current parameters have (Salakhutdinov and Mnih 2008) focused on a local rather than global minimum. Each of the modern Collaborative Filtering algorithms seeks to either reduce or capitalize on the complexity of Proposal large datasets. Algorithms utilizing decomposition, factorization, and SGD seek to reduce the dimensionality The observed increase in inter-epoch accuracy of of data in order to expose underlying relationships. Least models being trained with fault tolerant algorithms squares methods treat recommendations as linear suggests that there is significant value to be gained from equations, and attempt to find the best estimation of the such a training regimen. This study will first seek to parameters necessary to calculate an accurate rating. TALS establish whether there is in fact a boost in model accuracy and TSVD++ try to leverage additional information, in this between training cycles. If this is successful, we will case time, in order to more accurately model behaviors. attempt to identify a cycle size that results in the best final This study will focus on graph-based implementations model. of these algorithms given their recent popularity and the It is our belief that a fault-tolerant algorithm, utilizing ability to be executed on smaller machines. Recently a epoch size of no more than 20, should result in a model that is significantly more accurate than those trained using degree of accuracy that is a result of training. a single-epoch algorithm. Remaining Algorithms (SVD, One-Sided SVD, RBM, Methodology TSVD++, libFM, and PMF) Even though these algorithms do not support a Since this study is seeking to understand the impact of multi-cyclic training regimen, they will be trained on the epoch size on the accuracy of the resulting model, there same epoch schedule to provide additional context. will be no attempt to tune any of the algorithms. The default settings were used in order to limit the impact of Additional Note on Alternating Least Squares (ALS) user proficiency on the resulting models (Bickson 2012). Algorithms The CFT reports the accuracy of the model generated Overfitting was a significant problem with the ALS using root mean squared error (RMSE). This statistic is algorithms, and as such some modifications to the methods generated for every complete pass over the training set, and were made for those algorithms. Instead of stopping the is reported in terms of both training and validation. The training schedule with the first model resulting in a higher training RMSE will not be used in this study as it is only a RMSE, all schedules were run to at least a training epoch reflection of how well the model performs on the training of 80 iterations in size. This was an attempt to see if data. Instead the validation RMSE will be the only measure overfitting could be overcome with additional training. reported since it demonstrates how well the model handled Overfitting also led to the inclusion of two smaller epoch the test data. Additionally, the validation RMSE will sizes on the training schedule, two and ten, in order to identify problems with the model such as overfitting which identify if the optimal size was located at this smaller scale. are ignored in the training algorithm. Control Groups Fault Tolerance Algorithms (ALS, WALS, TALS, One final piece of information is necessary to NMF, SGD, BSGD and SVD++) determine the effectiveness of a multi-cyclic training The initial epoch size will be set to the maximum regimen, a control group. The control group was trained iterations specified in the tutorial, usually 6 iterations. using a single-epoch consisting of a large number of After the first training cycle, the argument iterations. The control groups were generated after the --load_factors_from_file=1 will be added to the algorithm experimental group in an attempt to limit the number of to resume training with the current state. Training cycles unnecessary training cycles, since large epochs require 8+ will continue until one of the following three conditions are hours to run. met: Waiting until after the experiment had the additional 1. The validation RMSE no longer improves with benefit of identifying a number of algorithms which did not subsequent training cycles (a minima is reached). need to included in the control group. All of the algorithms 2. The validation RMSE increases with additional utilizing ALS already had runs which demonstrated that training cycles (overfitting). even moderately sized single iterations were outperformed 3. Multiple training cycles result in an improvement by multi-cyclic regimens. This meant that it was only of the validation RMSE of less than .00005 / 10 necessary to run control groups on SGD, BSGD, NMF, and iterations (diminishing returns). SVD++. Upon reaching one of the above criteria, the current Each of the control groups is trained using the same state of the model will be saved and the validation RMSE parameters utilized in the experimental phase, only the and total number of training cycles recorded. progression of epoch size is altered. A starting size of 200 Next the epoch size will be increased and the entire iterations was selected since it is within the bounds of each process will repeat for the new model. The training epoch of the selected algorithms total number of iterations run will be increased on the following schedule: 6, 20, 40, 80, from the experiment. Each succeeding epoch will be 100, 120, 140, 180, 200. After 200 iterations the size of the doubled in size up to 1,600 iterations. The next epoch will epoch will be incremented by 50 for every subsequent be increased to 2,000 and then incremented by 1,000 every increase. The training of new models on this schedule will epoch after that. Training will continue until the RMSE no continue until the resulting model has a higher validation longer decreases between runs, and actually begins to RMSE than the previously generated model. increase. After the complete training of an algorithm is completed, an average starting RMSE is selected and Table 2: Netflix Control Group Results recorded as well. The starting RMSE is defined as the Algorithm # of Iterations RMSE validation RMSE after a single iteration of the algorithm. Since there is a degree of variation inherent in all of these SGD 200 1.123820 algorithms it is necessary to choose a representative initial BSGD 400 1.117690 state. The starting RMSE will allow for a model’s training regimen to be judged both by its final accuracy and the SVD++ 400 0.982024 NMF 2000 2.370640 The Data Table 3: Algorithms Ordered by Final RMSE (Multi-Cyclic Algorithms Bolded) This study uses the same dataset featured in Bickson’s Algorithm Initial RMSE Final RMSE blog, which is a synthetic Netflix dataset created using an anonymized sample of the original. Although the dataset PMF 2.498400 0.914566 from the Netflix challenge is unavailable because of RBM 0.979169 0.926279 copyright, the general characteristics of the data are well SVD++ 1.124420 0.931921 established by the competition’s creators (Bennett and Lanning 2007). The GraphLab Netflix sample was done to BSGD 1.363540 0.952970 preserve these characteristics (i.e. sparsity of data, user to SGD 1.240700 0.959890 movie ratio, user to rating ratio, etc) while ensuring the TSVD++ 1.041220 0.995435 anonymity of the users. The Netflix subset has the following general LibFM 1.090030 1.025770 characteristics: 95,526 unique users, 3,561 movies, and TALS 1.244250 1.147030 3,298,163 ratings (non-zeroes). This is a very sparse ALS 1.251550 1.159920 dataset with less than 0.97% of the resulting matrix having NMF 1.580120 2.375430 ratings. WALS 5.522080 5.325280 Historic Benchmarks Looking at the results purely in terms of accuracy In order to provide some context in which to view the suggest that the fault-tolerant algorithms are generally of results of this study the following results from the Netflix inferior quality. However, this view of the results is Prize (Netflix 2009) were retrieved: misleading. There was no effort made to tune these • Cinematch (2006) : RMSE 0.9525 algorithms, or to even check if there current settings were • 2007 Progress Prize : "KorBell" : RMSE 0.8723 conducive to producing good models, before these results • 2008 Progress Prize : "BellKor in BigChaos" : RMSE were generated. So while it is interesting which algorithms 0.8627 handled the data best, it doesn’t really show how well these models developed over the course of training. • Winners : "BellKor's Pragmatic Chaos" (2009) : RMSE By ordering the results by how much a model’s 0.8567 RMFigure 1: Trendlines of Algorithm PerformanceSE was Additionally, KorBell reported that the best result they improved over the course of training reveals a far different could get from a single method was an increase of 6.57% picture of the fault-tolerant algorithms. The improvement over Cinematch, RMSE ~0.8882 (Bell and Koren 2007). in final RMSE would suggest that these algorithms are more effective at training, but it is unclear whether this is a Results product of the algorithm itself or the training regimen. A closer look at all of the results, as well as the effect of the The initial impressions of the CFT suggested that the restart boost, should provide a clearer picture of the factors apparent boost in accuracy between training epochs would at work. favor a training regimen consisting of a large number of very small cycles (no more than 20 iterations per epoch). Table 4: Results Ordered by Percent Improved Over This type of training would lead to results superior, to those Initial RMSE generated without it. The first thing of note about Table 3 is that only eleven Algorithm % Improvement algorithms are included. Both SVD and One-Sided SVD PMF 63.39% have been left off of the table intentionally. No BSGD 30.11% modification to the number of training iterations yielded SGD 22.63% any variation in how these algorithms performed. Every run of these methods resulted in both an identical process SVD++ 17.12% and model. Additionally, these algorithms utilize a TALS 7.81% different error metric than the rest of the CFT by reporting ALS 7.32% an error estimate for each of the features generated. For these reasons these algorithms were left out of the rest of LibFM 5.90% the discussion of this study’s results, but an example of the RBM 5.40% output of each has been included at the end of this paper as TSVD++ 4.40% Appendix 1. WALS 3.56% NMF -50.33% Figure 1: Algorithmic Spark Lines Figure 1 shows the percentage difference between all Table 5: Epoch Size and Number of Cycles Trained for of the training regimens for a given algorithm and its Each Algorithm (Multi-Cyclic Algorithms Bolded) starting RMSE. The spark lines illustrate that while the Epoch Size Training Total Running Algorithm effectiveness of the algorithms may vary, their general (Iterations) Cycles Time (sec) training behaviors are very similar. From this limited PMF 180 - 2,501.9900 sample it would seem that there is no evidence that the restart boost creates a more effective training regimen. But RBM 80 - 692.3140 this is not evidence that it has no effect. SVD++ 40 7 691.6511 While the cyclical training regimen fails to outperform BSGD 40 92 6,860.8908 the single epoch algorithms, it is clearly not without its benefits. Most of the cyclically generated models have SGD 40 59 3127.5310 significantly higher accuracy than models learned over the TSVD++ 80 - 251.9750 course of a lone epoch with the same algorithm. All of the LibFM 200 - 908.0320 algorithms in the CFT suffer from the same design flaw, they fail to take into account validation RMSE. This results TALS 10 1 74.6055 in either overfitting or the algorithm becoming trapped in a ALS 2 4 47.4264 local minima, as the parameters of the algorithms become NMF 40 56 9,045.0080 more and more restrictive. Restarting the algorithm loosens the bounds on the program allowing it to move beyond WALS 10 1 66.1071 erroneous assumptions about the data. So while the hypothesized accuracy failed to materialize, there are The idea that a cyclical training regimen is superior to definitely significant advantages to this style of training a single epoch system has been thoroughly disproved, but with SGD, BSGD, and SVD++. what about the ideal size for these epochs? Table 5 shows, rather conclusively, that the ideal number of iterations is Table 9: Final RMSE of Control and Experimental Models larger than the 20 iteration ceiling that had been Algorithm Control Test RMSE Improvemen hypothesized. Each cycle needs to be large enough to take advantage of as many positive iterations (those that reduce RMSE t the validation RMSE), while minimizing the number of SGD 1.123820 0.959890 14.59% overfitted iterations (a common occurrence in later training cycles). BSGD 1.117690 0.952970 14.74% SVD++ 0.982024 0.931921 5.10% Conclusion and Further Study NMF 2.370640 2.375430 -0.20% This study is only meant to serve as an initial foray into the algorithms represented by the CFT, and a great deal remains to be researched. However, there are ./toolkits/collaborative_filtering/svd_onesided questions that relate directly to the results of this study --training=smallnetflix_mm --nsv=3 --nv=10 --max_iter=5 which would be logical next steps. The first is to determine --quiet=1 --tol=1e-1 if the results here are a product of the algorithms or the WARNING: common.hpp(print_copyright:104): data. This study needs to be repeated on a number of other GraphChi Collaborative filtering library is written by datasets to see if similar results are generated. Similar Danny Bickson (c). Send any comments or bug reports to training patterns would suggest that the findings of this danny.bickson@gmail.com study are a result of the algorithms used and not some [training] => [smallnetflix_mm] feature of the dataset. Secondly, a study should determine [nsv] => [3] if tuning the algorithms have an effect on training patterns. [nv] => [10] The CFT represents a powerful tool for bringing [max_iter] => [5] compact, easily executed data analysis to a variety of [quiet] => [1] ventures. Further experimentation with the package will [tol] => [1e-1] illuminate how it, and the algorithms it represents, can be Load matrix smallnetflix_mm put to the best use. Starting iteration: 1 at time: 0.560262 Starting step: 1 at time: 1.22546 Appendix 1: SVD and One-Sided SVD Output Starting step: 2 at time: 4.76345 Starting step: 3 at time: 8.32286 ./toolkits/collaborative_filtering/svd Starting step: 4 at time: 11.8127 --training=smallnetflix_mm --nsv=3 --nv=10 --max_iter=5 Starting step: 5 at time: 15.4576 --quiet=1 --tol=1e-1 Starting step: 6 at time: 19.1342 WARNING: common.hpp(print_copyright:104): Starting step: 7 at time: 22.8375 GraphChi Collaborative filtering library is written by Starting step: 8 at time: 26.5084 Danny Bickson (c). Send any comments or bug reports to Starting step: 9 at time: 30.1965 danny.bickson@gmail.com set status to tol [training] => [smallnetflix_mm] set status to tol [nsv] => [3] set status to tol [nv] => [10] set status to tol [max_iter] => [5] set status to tol [quiet] => [1] set status to tol [tol] => [1e-1] set status to tol Load matrix smallnetflix_mm set status to tol Starting iteration: 1 at time: 2.71863 Number of computed signular values 5 Starting step: 1 at time: 3.42417 Singular value 0 3276.69 Error estimate: 3.8991e-14 Starting step: 2 at time: 4.6307 Singular value 1 1064.07 Error estimate: 0.00146017 Starting step: 3 at time: 5.88049 Singular value 2 956.541 Error estimate: 0.00782078 Starting step: 4 at time: 7.11613 Singular value 3 889.028 Error estimate: 0.0440951 Starting step: 5 at time: 8.36737 Singular value 4 710.42 Error estimate: 0.0864268 Starting step: 6 at time: 9.63118 Lanczos finished in 47.1228 Starting step: 7 at time: 10.9138 Starting step: 8 at time: 12.2115 References Starting step: 9 at time: 13.5747 Bell, R. M.; Koren, Y. (2007). “Lessons from the Netflix set status to tol Prize Challenge.” SIGKDD Explorations Vol. 9 Issue set status to tol 2, 75-79. ACM New York, NY. set status to tol Bennett, J.; Lanning, S. (2007). “The Netflix Prize”. In set status to tol Proceedings of KDD Cup and Workshop 2007, 3-6. set status to tol ACM New York, NY. set status to tol Bickson, D. (2012). “Collaborative filtering with set status to tol GraphChi”. Large Scale Machine Learning and Other set status to tol Animals. Accessed on December 5, 2012. Number of computed signular values 5 http://bickson.blogspot.com/2012/08/collaborative-filte Singular value 0 3276.69 Error estimate: 0.000305186 ring-with-demographical Singular value 1 1064.07 Error estimate: 1.18507e-13 Comon, P.; Luciani, X.; de Almeida, A. L. F. (2009). Singular value 2 956.541 Error estimate: 0.00162432 “Tensor Decompositions, Alternating Least Squares Singular value 3 889.028 Error estimate: 0.00841469 and other Tales.” Special issue, Journal of Singular value 4 710.42 Error estimate: 0.0551811 Chemometrics. In memory of R. Harshman. Going to save output vectors U and V Lanczos finished 22.5142 Hernandez, V.; Roman, J. E.; Tomas, A. (2007). “Restarted Zhou, Y.; Wilkinson, D.; Schreiber, R.; Pan, R. (2008). Lanczos Bidiagonalization for the SVD in SLEPc.” “Large-Scale Parallel Collaborative Filtering for the SLEPc Technical Report STR-8. Netflix Prize.” In the Proceedings of the 4th Hinton, G. (2010).” A Practical Guide to Training international conference on Algorithmic Aspects in Restricted Boltzmann Machines.” University of Information and Management, 337-348. Shanghai, Toronto Tech Report UTML TR 2010-003. China. Hu, Y.; Koren, Y.; Volinsky, C (2008). “Collaborative Filtering for Implicit Feedback Datasets”. IEEE International Conference on Data Mining (ICDM 2008), IEEE Washington, DC. Koren, Y. (2009). “Collaborative filtering with temporal dynamics.” In Proceedings of the 15th ACM SIGKDD, 447-456. ACM, New York, NY. Koren, Y (2008). “Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model”. ACM SIGKDD. ACM, New York, NY. Koren, Y.; Bell, R.; Volinsky, C (2009). “Matrix Factorization Techniques for Recommender Systems.” In IEEE Computer, Vol. 42, No. 8. (07 August 2009), pp. 30-37. IEEE Washington, DC. Kyrola, A.; Blelloch, G.; Guestrin, C. (2012). “GraphChi: Large-Scale Graph Computation on Just a PC.” In Proceedings of the Tenth USENIX Symposium on Operating Systems Design and Implementation, 31-46. OSDI Press Hollywood, CA. Lee, D..D.; Seung, H.S. (2001). “Algorithms for Non-negative Matrix Factorization”, Advances in Neural Information Processing Systems 13, 556-562. Low, Y.; Gonzalez, J.; Kyrola, A.; Bickson, D.; Guestrin, C.; Hellerstein,J. M. (2010). "GraphLab: A New Parallel Framework for Machine Learning." In Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence. AAAI Press Palo Alto, CA. Netflix (2009). “Leaderboard”. NetflixPrize.com. Accessed on January 5, 2013. http://www.netflixprize.com//leaderboard Pearl, J. (1994). "A Probabilistic Calculus of Actions". In UAI'94 Proceedings of the Tenth international conference on Uncertainty in artificial intelligence. Morgan Kaufman San Mateo CA. pp. 454–462. Rendle, S. (2010). “Factorization Machines.” in Proceedings of the 10th IEEE International Conference on Data Mining (ICDM 2010), Sydney, Australia. IEEE Washington, DC. Salakhutdinov, R.; Mnih, A. (2008). “Bayesian Probabilistic Matrix Factorization using Markov Chain Monte Carlo.” In the Proceedings of the International Conference on Machine Learning. Walia, R.R. (2008). Collaborative Filtering: A comparison of graph-based semi-supervised learning methods and memory-based methods. Strengthening the Role of ICT in Development, 70-82. Witten, I.H.; Frank, E.; Hall, M.A. (2011). Data Mining: Practical Machine Learning Tools and Techniques 3rd ed. Morgan Kaufman San Mateo CA.