organized as follows: In section 2 news recommender systems and The content of the user profile for this kind of recommender their details and challenges are described. In section 3 different system which has not very structured format is different from dimensions of user profiles and machine learning techniques are others. In order to have an exact and practical model of the user explained. Features of the user profiles with respect each profiles, the system needs to know the behavior of the user technique of learning are summarized. In section 4, applying the including background, interest and goals. These features are filtering techniques for content-based, collaborative and different changing over time, so considering the temporal parameters such kinds of hybrid system is discussed. The classification of machine as time and location is crucial [3]. learning techniques and their addressed problems are illustrated, There are three major presentations of terms in the user before the conclusions are presented in Section 5. profile. The first approach is presenting terms as vectors in a vector space model. In order to weigh correctly every single word based on its frequency in every document and in the collection of documents, TF-IDF is often applied. This measure puts more emphasis on one word that appears frequently in one specific document and not in other ones. So it will gain more weight and appointed document, will be retrieved to a target user. But the 2. News Recommender Systems problem of polysemy (multiple meaning for one word) and News recommender systems share many features with synonymy (multiple words for identical meaning) remain. The information retrieval systems and human computer interaction as desired approach reflects cultural and linguistic knowledge of well. Text mining techniques for large scale data sets are needed, terms and also could use reasoning on their content. As a result, and machine learning methods are employed when learning cycles the presentation is more intelligent and is not a simple bag of can be built into the systems. In general there are three steps. First words and could provide the knowledge about desired terms [1]. of all, data pre-processing such as sampling, dimension reduction, The second one is the analysis words in the format of entity. They denoising with use of similarity functions are normally applied. have meanings and relations, but they suffer from generalization Then the text is analyzed through supervised or unsupervised or specialization since there is no hierarchical relationships among machine learning techniques depending on availability of training the entities [3]. The third one is the semantic analysis that is data sets. At the end the result is interpreted through for example ontology-based. It has hierarchical relationships between the the F1- measure, ROC or MAE [1]. semantic concepts modeling user interests. The terms that indicate the user interests including their interests that last longer or the If we consider news recommender system as search engines, the ones that appear only for a short time could be enriched by user profiles can be regarded as long search queries. The system semantic approaches. The advantage of providing ontologies for ranks the results on the basis of well the profile matches the the user interests is that all the terms or entities are in hierarchical descriptions of the news articles. Formally, the appropriateness of relationships which give more specific detail of user interests at recommended news to the user can described by the following the side of the general ones [3]. The semantic enrichment could utility function [1]: benefit from encyclopedic knowledge beside the knowledge of applied documents. So the terms are semantic vectors in word This function assigns a score r for each combination of user c and space model [1]. Each of them are indexed by their weights but news story s. Matrix indicates the characteristics of the user and later will be interpreted semantically by using Wikipedia. It is shows the different specifications of available articles such as called Explicit Semantic Analysis (ESA) [4]. topic, location, news agency, date and other useful attributes. All The feedback of the user is the other approach of user modeling. different algorithms in recommender systems try to maximize the In general s/he could communicate and provide their interest result matrix. Each entry of could be any non negative internal towards the news explicitly or implicitly. Explicit feedback is to between 0 and 1 or 0 and 100 based on the system definition. At provide their interest (disaffection) directly to the system. It could the end, an article that maximizes the utility function will be be actions such as rating, like or filling the survey through the recommended [1]: interface of the application. Implicit feedback includes the interactions such as click on articles (touch in mobile device), scrolling articles using a mouse or a keyboard (swapping in the News recommender systems differ in the context of items mobile device), printing or saving articles, copying and posting a structures from other recommenders. The structure of news part or all of articles, reading articles, forwarding or sharing the articles is not following any specific format. There are many news articles and providing the qualitative comments on the article. articles in a day that have very short life spans while the system Recommender systems are highly dependent on user feedback. As must scale to deal with huge volumes of data. Besides, the news long as the user interacts with the application, the accuracy of the recommender system must always recommend interesting articles system may gradually improve. Explicit feedback tends to to the user, though it should not make over-specialize for the produce more exact user profiles than what is possible with target user. [2] implicit feedback. Unfortunately, not all users are willing to spend time to provide such feedback, so the implicit signals of the users 3. User Profiles are normally the basis of the recommendation [5]. The desired user profiles need to have a changing essence and Specify ying the ty ype of user’s interest could help the sy ystem to flexible content. These profiles show their preferences towards cover all domains of their attention. The long-term interest is news articles by modeling the interesting articles. Besides, storing more dependent on the user profession and the personal user interactions is a basis to know their favorite topics which last background than what will be traced by the log history. But the longer and which are only for a short period of time. short-term interest is the one mostly related to the current trend of This model consists of meta-data such as time and location, which public that s/he has communication with. Although depending on is changing according to the user behavior. the goals, the long-term interest will change gradually. Besides, supervising the context of user’s attention could provide good features are more than instances, linear kernel is good evidence to capture the short-term interest and update their long- enough to be applied [16, 17]. term interest time by time. In [6] by defining running context over  Probabilistic methods and Naive Bayes: Bayesian Belief category and topic, the current user‘s interest is captured. The old Network with conditional independency is the most user profile that is the indicator of their long-term interest is applicable one. Multinomial (Bernoulli) and updated progressively if there is nothing in common with their multivariate are two types of Naive Bayes. While in the current focus. Besides, there should be a balanced focus on the old Bernoulli model absence or presence of a model is and new user profile. While keeping the old user profile and over checked, in multivariate one the number of occurrences looking the context results in dissatisfaction, giving too much of a term will be calculated [18, 19]. priority to the current context will not cover the news articles that  Neural Network: Single layer perceptron and multi layer are related to their background and are the basis of their interest. for non linear separable problems are the samples of In addition, different time of day (morning, evening) and week applied neural network in the recommender systems (weekdays and weekend) could affect the user profile [7]. [20]. Considering the topic of the news articles, target users may have Below is the list of unsupervised learning techniques: different desires at different times. As an example, s/he might  Probabilistic methods: If the structure of Bayesian have more interests in politics and economics in weekdays and network is not known then building the DAG Bayesian focus more on lifestyle news in the weekend [8]. with scoring function, constraint based learning or While personalizing the news is desirable, the importance of Conditional Independency can be applied. The last one public trend is not negligible. In [9] based on the frequency of has more efficiency [21]. The other techniques such as user clicks, public trend could provide the interesting news Bayesian Hidden Score (pairwise learning) and graph- articles as well. If there are not enough clicks from the user side, based learning have been applied in [22]. then according to their location, public trend of that location is a  Neural Network: Self Organizing Map (Kohonen) and good indicator to recommend the news. This dimension of the Restricted Boltzmann Machine belong to the category user profile that specifies the location has a key role in of unsupervised learning [20]. recommending news articles. Short-term interests of the user are  Clustering: flat clustering by k-means algorithm deals highly dependent on their location. Location could capture public with the categorical data and the most frequent term will trend and find similar networks of users as well. Sometimes be the centroid. In the hierarchical clustering, the other ignoring the user profile and focus on the context is helpful (in type of clustering, divisive is more accurate than economical news, user profile is not very helpful but the context agglomerative. There are two approaches to label tracing is more informative), while other times it is better to count clusters. The first one is differential that through feature only on the user profile (for entertainment section user profile selection a label with a higher score will be chosen. The enrichment is much better than context) [10]. second one is inter clustering that the closest one to the As the amount of data explodes, the importance of extracting title or the higher weight to the centroid of the cluster models and predicting unseen data with machine learning will be chosen as the label. The drawback of cluster- techniques is increasing [11]. There are two major types of internal labeling is disability to distinguish between learning techniques, supervised and unsupervised. In the former words which are frequent in the whole clusters and the one, an annotated training dataset is provided, whereas in the ones that are frequent only in one specific cluster. latter one, the machine explores the data to identify interesting Labeling in hierarchical clustering due to the dependent patterns without training data. Below is the list of supervised definitions of parent, child and sibling is more learning techniques used in recommender systems: complicated [16]. Table 1 shows the applied machine learning techniques to build  Decision Trees (C4.5 or KART) handle categorical- up a user profile. nominal and heterogeneous data. It is also able to cope with missing values. Through pre pruning, overfitting 4. Applying User Profiles in Recommender will be addressed. It tends to work well with small sized Systems datasets, though the cost of decisions on continuous data There are different approaches to filter out the information. streams is high [11, 12]. Content-based and collaborative filtering are the most applicable  Rule-based (RIPPER) can handle multi value features ones. In content-based filtering, the concept of news articles will very well. It is decision tree-based and uses rules to be analyzed. Then according to the content of the user profile (i.e. categorize new items. It utilizes post pruning to find the characteristic of read articles), similar articles are predicted and best fit for the rule set [13]. presented to the user. In the content-based filtering, the utility  K Nearest Neighbor (KNN) can handle continuous data function is: through Euclidean, Manhattan or Minkowski distance and cope with categorical data through Hamming distance. It is a lazy learner that works well with few instances [14, 15].  Rocchio and Relevance Feedback: the user profile is If each of the content of the user profile and item profile is regarded as a query [16] and based on the implicit represented by TF-IDF weight, then the scoring function could be feedback of user, the recommendation will be improved calculated through cosine similarity of vectors of the weight. To in time. achieve the accurate prediction, attributes of news articles that  Support Vector Machine (SVM): through SVM have been counted on, are important. Since the nature of news reduction of sensitivity to the noises and increasing article is unstructured, extracting relevant and important features generalization is done. For non linear problem if has a key role in content-based filtering. If the articles are categorized with minimum misclassification error, then storing (Bernoulli Model) is applied for modeling user behavior, the interesting news articles in the user profile is much easier and output is binary as it is considering absence or presence of terms consequently, recommendations are of higher quality. Bayesian regardless of their conditional independency [1]. It can suggest the Networks can be utilized well for learning user profiles based on new item to the target user by comparing the new item’s the articles that have been read. It can model profiles of the users characteristics to the terms in the user’s profile. But if there is not through ignoring missing data and considering conditional enough attributes, content-based filtering is normally not the most dependency in one specific category of news articles. It can efficient one. If the user is new to the system it cannot recommend provide probabilities of each attribute of article by its nodes. The anything as there is no content of their profile available. Besides, modeled domain includes continuous data. Then similarity of the it causes lack of serendipity due to providing too many similar user profile based on predicted attributes of article and available news articles to the user. Considering the collaborative approach news articles is computed and the ones with the highest score will for filtering information, there are two different models, memory- be recommended. If another technique such as Naive Bayes based and model-based. Memory-based utilizes the log Table 1. ML techniques and features of user profiles ML Techniques User Profile Features Decision Tree (C4.5) Semantic enrichment can be handled at entity level, but in the beginning of building the user profile or for capturing short-term interest [13, 23]. Rule-based Semantic enrichment can be handled at entity level. More interesting categories of news may be predicated (RIPPER) through rules [1]. KNN Captures the short term interest of user and popularity of the item among a group of user. Rocchio and User profiles are regarded as queries, the system improves over time from relevance feedback of the user [16]. Relevance Feedback Support Vector It outperforms KNN,C4.5 and Rocchio [16] with the Reuters dataset Machine Probabilistic Bernoulli works well with small sizes of data set and multinomial works well in large sizes of datasets. DAG methods and Naive captures the dependency of items in more detailed capturing interest, vigorous towards missing data and could Bayes disregard noisy data. BHS and graph-based capture online interest of the user [22] Neural Network It can represent details of the user’s interest through deep learning of three layer perceptron [24]. Clustering The content of the items are clustered and then item-based collaborative is implemented on the output. Fuzzy membership over the k-means. Similarity of the item-rating matrix, the group-rating matrix (MovieLense) Hierarchical clustering for the news groups (LDA for small dataset and PLSI for large dataset) [25] similar news articles that are interesting to one specific user) is history of all users and put top-N similar users who have the not feasible due to lack of labeled data in the training phase, same taste about the news articles into one specific group. clustering of news or users could be a practical solution. With the Google News dataset, clustering is done on the basis of Then to provide the latest and interesting news articles to the users’ clicks on different news article. Through clustering, latent target user, it filters out users with the same interest and factors (latent semantic analysis) can be revealed. Consequently, recommends the new articles that have been read by them. It is ignoring the hidden values will result in a very poor accuracy. It working with a matrix of user’s profile and all the news articles. could be helpful to distinguish hidden variables through the It is possible to apply K Nearest Neighbor (through clustering and provide more accurate prediction of news articles neighborhood measurement) to find the closest users to the [23]. One the technique to implement this approach is building current active user. The other approach is applying similarity up the matrix of users and item as matrix factorization. The measurement like cosine similarity or Pearson correlation, which matrix of users and news articles is suffering from sparsity, provide the new item for the target user if it has similarity with since there are several positions that users do not provide any previous chosen items. It can help us find similar users or items feedback. To find the hidden variables that affect the regarding to the context of memory [23]. recommendation as well, UV decomposition (it is one instance The other type of collaborative filtering is model-based. It is of Singular Value Decomposition) is possible to be applied. If more scalable and much faster than memory based collaborative the utility matrix is ( indicates the user and filtering. Through this type of filtering not all the dataset will be indicates the news articles), then UV decomposes it traced and investigated, but only some information will be multiplication of two different matrixes including modeled. As finding the similarity between users or news and : articles (users with the same interest in the specific news or two combination. In the movie recommender domain [29], the RMSE is a common tool to measure the accuracy of prediction RIPPER algorithm is implemented with item features and users blank entries in considering the product . rating. Although it is working much faster than memory based, it is less There are three other models of hybrid systems that are ordered exact than it. In spite of all the applicable different approaches by their intrinsic structure: of collaborative filtering, it cannot make the accurate prediction Feature augmentation: One of the filtering techniques is for the new user or the new item (cold-start problem). The core applied to compute rating scores or to classify items. The output of all the algorithms is dependent on the group of users (or of this filtering is the input for the other filtering technique. In items) in order to find the proper match for the target user. Libra system, content-based filtering through Naïve Bayes is Consequently it has nothing to present to the user with unique done on data that comes from Amazon. The data from Amazon taste. that show related authors and titles were implemented using As each of these filtering techniques has its own problems and collaborative filtering. Collaborative filtering is done first. challenges in recommender systems, a hybrid system is often Meta-level: It provides a model through one of the filtering preferred. It takes into account both filtering in predefined step methods as an input for the other one. The model is the complete and could overcome drawback of each. Considering two one, not a learned model like feature augmented techniques. In techniques of filtering (content-based and collaborative), the Fab [30] at first by means of relevance feedback and the order of combination of them might be important to build a Rocchio algorithm, collections of items (the need of users in hybrid system. Although in some techniques of hybridization, mass of dataset in web) are composed (content-based). K- the order is not a matter. The techniques that order is not nearest neighbor is then used with collaborative filtering to important are [26]: complete the recommendations. Meta-level is the only ordered Mixed: the result from both techniques will be presented in technique that applies content-based filtering first. one grouped or separate list. It has been utilized in [27] to Cascade: Approximately similar to the other ordered provide the TV shows to the users. The mixed hybrid system techniques, it refines the result of candidates that have been provides recommendations based on the characteristics of each filtered by the previous technique. But if the items in the first show and preferences of other users. filtering have very low priorities, they will not be in the second Weighted: The score for each technique is computed, and filtering stage. In fact, the second filtering step is only applied to the weighting of final score will be the basis for the provide more accurate recommendations and if an item has not recommendation. In personalized Tango (P-tango) for online enough rating score, it will not be in the second phase. Fab [31] newspapers, equal weights are assigned to both filtering is the example of this technique. With collaborative filtering on techniques. Gradually each weight is increasing regarding the the selection stage, the items are chosen with an exact score and user rating. Based on the rating, the absolute error is computed presented to the user. and is decreasing through the better recommendation. According to the implemented hybrid systems in news Switching: This technique uses some criterion to switch recommender system (such as Daily Learner), switching schema between filtering techniques and based on the specific chosen is the most common strategy. It can start with content-based filter, recommends the item. In the DailyLearner switching filtering and utilize Naive Bayes to categorize the news articles hybrid system, content-based filtering with k nearest neighbor is based on the content of the articles and apply item-based first applied. If it does not produce sufficient recommendations, collaborative filtering to calculate the similarity between the collaborative filtering takes advantage of similar users’ interests news articles and the user profile. On the other hand, it is also to recommend desired items. In another system, item-based possible to apply collaborative filtering to find the closest users collaborative filtering is triggered if the accuracy of the content- to the active user (through KNN) and then with content-based based filtering part is low [28]. filtering identify much more similar items based on the similarity computation of user profile and news articles. Feature combination: The technique takes advantage of one filtering type such as collaborative filtering as feature allied Table 2 shows the applied machine learning techniques to deal with data. Then content-based filtering is applied. Through this with the issues of news recommender systems. kind of hybrid system, the absolute dependency on users is dropped by applying collaborative filtering as a feature Table 2. Machine learning techniques and challenges addressed ML Techniques Challenges addresses of news recommender system Decision Tree Capturing short term interest [1]. (C4.5) Rule-based Serendipity can be supported with new category reasoning [32]. (RIPPER) KNN Short-term interests and provide the latest news to the user based on their interests [1]. Rocchio and Handling long-term interest of the user [1]. Relevance Feedback Support Vector Sparse Problem and huge data after a long time usage of the application[33]. Machine Probabilistic Handling long-term interest of the user methods and Sparse problem Naive Bayes Noisy data Cold Start Precious interest of the user [28]. Neural Network Short term and long term [34]. Tied Boltzmann with residual parameter could outperform on non cold-start problem in comparison with simple method of collaborative filtering, Pearson correlation for the items. It also is competitive with the cold-start problem in content-based filtering. (Netflix) Changing interest of the user [24]. Clustering Cold start Through fuzzy membership new and interesting news articles are possible to be represented to the user [25]. Systems, in In Proceedings of the 10th International Conference on Web Information System and Technologies 5. Conclusion April 2014: Barcelona. The news recommender system is somewhat different from [7] Abel, F., et al., Analyzing user modeling on twitter for other recommender systems. It is used to provide a variety of personalized news recommendations, in Proceedings of the personalized news articles that have very short life spans. In 19th international conference on User modeling, adaption, addition the range of the user’s interests is wide and changing and personalization. 2011, Springer-Verlag: Girona, Spain. over time and contexts. These characteristics necessitate very p. 1-12. dynamic analyses of user profiles. [8] Adomavicius, G. and A. Tuzhilin, Context-aware In this paper the distinguishable characteristics that affect recommender systems, in Proceedings of the 2008 ACM recommendation strategies are assessed. The user feedback on conference on Recommender systems. 2008, ACM: recommended items is one of them. Different algorithms of Lausanne, Switzerland. p. 335-336. machine learning (that fall into the categories of supervised and unsupervised) are discussed to build up user profiles. On the [9] Liu, J., et al., Personalized news recommendation based on other hand, as the user profile is dependent on the whole click behavior, in Proceedings of the 15th international framework of filtering methods, the techniques are also studied. conference on Intelligent user interfaces. 2010, ACM: They utilize user profiles in diverse ways which affect the Hong Kong, China. p. 31-40. accuracy of the corresponding recommendations. [10] Bellogín, A., et al. Discovering Relevant Preferences in a Personalised Recommender System using Machine Learning Techniques. in Preference Learning Workshop References (PL 2008), at the 8th European Conference on Machine [1] Ricci, F., et al., Recommender Systems Handbook. 2010: Learning and Principles and Practice of Knowledge Springer-Verlag New York, Inc. 842. Discovery in Databases (ECML PKDD 2008). 2008. [2] Özgöbek, Ö., J. A. Gulla,, R. C. Erdur, A Survey on [11] Witten, I.H., E. Frank, and M.A. Hall, Data Mining: Challenges and Methods in News Recommendation, in In Practical Machine Learning Tools and Techniques. 2011: Proceedings of the 10th International Conference on Web Morgan Kaufmann Publishers Inc. 664. Information System and Technologies April 2014: [12] VENKATADRI.M, L.C.R. A Comparative Study On Barcelona. Decision Tree Classification Algorithms In Data Mining. [3] Bouneffouf, D., Towards User Profile Modelling in 2010; Available from: Recommender System. 2013. https://www.academia.edu/1374211/A_Comparative_Study _On_Decision_Tree_Classification_Algorithms_In_Data_ [4] Gabrilovich, E. and S. Markovitch, Computing semantic relatedness using Wikipedia-based explicit semantic Mining. analysis, in Proceedings of the 20th international joint [13] Pazzani, M.J. and D. Billsus, Content-based conference on Artifical intelligence. 2007, Morgan recommendation systems, in The adaptive web, B. Peter, K. Kaufmann Publishers Inc.: Hyderabad, India. p. 1606-1611. Alfred, and N. Wolfgang, Editors. 2007, Springer-Verlag. p. 325-341. [5] Jon Atle Gulla, J.E.I., Arne Dag Fidjestøl, John Eirik Nilsen, Kent Robin Haugen, and Xioameng Su, Learning [14] Deokar, S., WEIGHTED K NEAREST NEIGHBOR. 2009. User Profiles in Mobile News Recommendation. Journal of [15] Webb, G., M. Pazzani, and D. Billsus, Machine Learning Print and Media Technology Research, September 2013. for User Modeling. User Modeling and User-Adapted Vol II, No. 3: p. pp. 183-194. Interaction, 2001. 11(1-2): p. 19-29. [6] Jon Atle Gulla, A.D.F., Xiaomeng Su and Humberto [16] Manning, C.D., et al., Introduction to Information Castejon, Implicit User Profiling in News Recommender Retrieval. 2008: Cambridge University Press. 496. [17] Prügel-Bennett, M.A.G.a.A., Building Switching Hybrid [32] Markward Britsch, N.G., Michael Schmelling, Application Recommender System Using Machine Learning Classifiers of the rule-growing algorithm RIPPER to particle physics and Collaborative Filtering. IAENG International Journal analysis. 2008. of Computer Science. [33] Anatole Gershman, T.W., Eugene Fink, Jaime Carbonell, [18] Margaritis, D., Learning Bayesian Network Model News Personalization using Support Vector Machines. Structure. 2003, University of Pittsburgh. [34] Kyo-Joong Oh, W.-J.L., Chae-Gyun Lim, Ho-Jin Choi, [19] Barber, D., Bayesian Reasoning and Machine Learning. Personalized News Recommendation using Classified 2010. Keywords to Capture User Preference, in Advanced [20] Peretto, P., An Introduction to the Modeling of Neural Communication Technology (ICACT). 2014. Networks. 1992: Cambridge University Press. [21] Kotsiantis, S.B., Supervised Machine Learning: A Review of Classification Techniques, in Proceedings of the 2007 conference on Emerging Artificial Intelligence Applications in Computer Engineering: Real Word AI Systems with Applications in eHealth, HCI, Information Retrieval and Pervasive Technologies. 2007, IOS Press. p. 3-24. [22] Bian, J., et al., Exploiting User Preference for Online Learning in Web Content Optimization Systems. ACM Trans. Intell. Syst. Technol., 2014. 5(2): p. 1-23. [23] Rajaraman, A. and J.D. Ullman, Mining of Massive Datasets. 2011: Cambridge University Press. 326. [24] Gunawardana, A. and C. Meek, A unified approach to building hybrid recommender systems, in Proceedings of the third ACM conference on Recommender systems. 2009, ACM: New York, New York, USA. p. 117-124. [25] Mouton, C., Unsupervised Word Sense Induction from Multiple Semantic Spaces with Locality Sensitive Hashing, in International Conference RANLP. 2009. [26] Burke, R., Hybrid Recommender Systems: Survey and Experiments. User Modeling and User-Adapted Interaction, 2002. 12(4): p. 331-370. [27] Cotter, P. and B. Smyth, PTV: Intelligent Personalised TV Guides, in Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence. 2000, AAAI Press. p. 957-964. [28] Ghazanfar, M.A. and A. Prugel-Bennett, An Improved Switching Hybrid Recommender System Using Naive Bayes Classifier and Collaborative Filtering. International Multiconference of Engineers and Computer Scientists (Imecs 2010), Vols I-Iii, 2010: p. 493-502. [29] Basu, C., H. Hirsh, and W. Cohen, Recommendation as classification: using social and content-based information in recommendation, in Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence. 1998, American Association for Artificial Intelligence: Madison, Wisconsin, USA. p. 714-720. [30] Balabanovi, M., #263, and Y. Shoham, Fab: content-based, collaborative recommendation. Commun. ACM, 1997. 40(3): p. 66-72. [31] Balabanovi, M. and #263, An adaptive Web page recommendation service, in Proceedings of the first international conference on Autonomous agents. 1997, ACM: Marina del Rey, California, USA. p. 378-385.