Constructing CP-nets from Users Past Behaviors Reza Khoshkangini Maria Silvia Pini University of Padova, Padova, Italy Department of Information Engineering, University of khosh@math.unipd.it Padova, Padova, Italy pini@dei.unipd.it Francesca Rossi Dinh Van Tran IBM T. J. Watson Research Center, University of Padova, Padova, Italy Yorktown Heights, NY, USA dinh@math.unipd.it (on leave from University of Padova) ABSTRACT may cause changes in users’ preferences over time. Self-adaptive While recommender systems over time have significantly improved recommender systems have been developed to overcome such prob- the task of finding and providing the best service for users in various lems in the application domains, where the context of users and/or domains, there are still some limitations regarding the extraction of of services can influence the recommendation [6, 18, 24]. users’ preferences from their behaviors when they are dealing with Users’ preferences are often qualitative and conditional. For a specific service provider. In this paper we propose a framework to example, if it is sunny, I prefer to go to a restaurant that has a garden automatically extract and learn users’ conditional and qualitative with tables outside. CP-nets [24] are a graphical model to represent preferences by considering past behavior without asking any infor- in a compact and intuitive way this kind of preferences. Briefly, mation from the users. To do that, we construct a CP-net modeling a CP-net is a graph in which each node is labeled with a table users’ preferences via a procedure that employs multiple Informa- describing the user’s preference over alternative values of this node tion Criterion score functions within an heuristic algorithm to learn given different values of the parent nodes. An example of CP-net is a Bayesian network. The approach has been validated experimen- shown in Figure 1.a. CP-nets has been already used to model users’ tally on a dataset of real users and the results are promising. preferences in automated decision making and in modeling human preferences in real-world applications. KEYWORDS In this paper, we propose a framework to automatically con- struct CP-nets from users’ past behaviors without demanding any Bayesian Network, CP-net, Recommender System information from the users. Users’ past behavior (or preferences) is characterized, by the set of domain features, which are logged in their profiles, through the their interactions with the system. For example, in the restaurant recommendation domain, the users’ past behavior may be defined by the restaurants that have been selected previously by the user and each restaurant can be seen as 1 INTRODUCTION a combination of values to a set of features (such as, price range, Over the past decades significant efforts have been undertaken location, type of restaurant, etc.). among researchers, practitioners and companies to develop various To build a CP-net from agent’s past behavior we perform the types of recommender systems to meet users’ requests [3, 23]. The following steps. aim of these systems is to personalize service recommendations for individual users, as well as aggregating users’ preferences to recommend a service for a group of users in various domains from • Given the user’s history, for example previous selected restau- movies [26] to restaurants [14, 18], from hotels to recommending rants, we first identify some of the most important attributes items or products, etc. that have influenced the final choices. Then the selected Advancement in recommender systems have been performed attributes become the features/nodes of the CP-net. by considering the context, which is basically defined as any in- • Then, we divide these features in three layers: root layer, formation that could be used to characterize the situation of an intermediate layer and layer. The root layer contains only entity in a particular domain [11]. On the one hand, considering the the root node, which is the most important feature according context can improve the performance of the recommender systems, to both user’s preferences and the procedure used to extract which leads to enhance the satisfaction degree of users by properly features. The root node may be for example the price of the fulfilling their demands [17]. On the other hand, it can make the restaurant if the possible options for a user in selecting a recommendation task more complex, since changes in the context restaurant depend mainly on his budget. The target layer contains only the target node which is the Copyright © CIKM 2018 for the individual papers by the papers' most dependent feature. The target node will be selected authors. Copyright © CIKM 2018 for the volume as a collection according to the domain knowledge. For instance, in the by its editors. This volume and its papers are published under restaurant recommendation domain, the target not may be the Creative Commons License Attribution 4.0 International (CC BY 4.0). the type of the restaurant (or food), while in the movie rec- O i > O j between outcomes, under certain assumptions/constraints ommendation domain, the target node may be the name on the inputs for learning the structure of the users’ CP-nets. of the movie. The intermediate layer contains all the other The rest of the paper is organized as follows. Section 2 provides nodes which correspond to all other selected features. some basic notions of CP-nets and Bayesian nets. Section 3 presents • Now that we have divided the features in three groups, we our CP-net constructor and Section 4 describes the experimental need to express the dependencies among them. We assume evaluation and the results. Finally, Section 5 summarizes the paper that there are only outgoing arcs from the root node, and and provides some hints for future work. only in-going arcs in the target node, while to identify the dependency relations between features in the intermediate 2 CP-NET AND BAYESIAN NETWORK layer we learn a Bayesian Network (BN) from the user’s preferences by exploiting some scoring functions (such as In this section we present the key notions of CP-nets and Bayesian BIC, and AIC [8]) implemented by an Hill climbing algorithm. Networks. Finally, from the conditional probabilities annotated with each node of the BN we define the corresponding conditional CP-net: CP-net [5] is a graphical model to represent conditional preferences associated to every node of the CP-net. and qualitative preference relations between variables (aka features). Lets assume, there is a set of variables V = {X 1 , ..., X n } with finite domains D(X 1 ), ..., D(X n ). For each variable X i , each user specifies Experimental results show that the presented approach for building a set of parents P (X i ) that can affect her preferences over the value CP-nets from users’ past behavior is promising in the restaurant and of X i . So this defines a dependency graph such that every variable movie recommendation domains. In the experiments performed on X i may have P (X i ) as its immediate predecessors. For each node X i , a real data-set the scoring function BIC is the best one. there is a conditional preference table that shows for each possible The proposed approach to construct CP-nets from users’ his- combination of parents values the preference over values of X i . tories is important for two reasons: the former one is that users’ An example of CP-net is shown in Figure 1 (a). It contains three preferences are essentially qualitative and constructing these qual- features (aka variables) X 1, X 2 and X 3, standing for the Price, Qual- itative tendencies in the system can increase the knowledge of ity and the Type of restaurant, respectively. X 1 is an independent the system from users’ preferences. The latter is inspired from the variable, while X 2 depends on X 1 and X 3 depends on both X 1 and former advantage, such that this construction is significantly im- X 2. This CP-net models the fact that the users strictly prefers a portant when the service system needs to deal with the context that restaurant with a medium price (x 11 ) to an expensive one (x 21 ), while brings up the dynamicity challenge in a real-time recommender his preference between “medium” quality (x 12 ) and “high” quality system [18]. The change in context leads to change on players pref- (x 22 ) is conditioned on the price to be selected. In addition, the pref- erences over the time. Hence constructing users’ preferences in a erence on the type of restaurant Chinese = x 13 and Mexican = x 23 qualitative form enables self-adaptive recommender systems [18] depends on both the quality of the restaurant and the price. to increase users’ satisfaction degree. For example, assume that we know from the context that there is Bayesian Network (BN): A Bayesian network is a probabilis- blocked traffic on the road for going to the recommended restaurant, tic graphical model that represents a set of variables and their the system should recommend another restaurant starting from the conditional dependencies via a directed acyclic graph (DAG) G = built CP-nets modeling the users’ preferences. (V , EG ) [7, 16], where V is the set of features, and EG represents the Modeling and learning the users’ preferences expressed via CP- set of direct arcs (dependency) between the features, e.g., X i → X j , nets is a task that has been studied extensively by adopting various and there is a constraint in BN that avoids any directed cycles techniques, such as observing/asking multiple questions to the (similarly to the concept of acyclic CP-net). users [4]. In some studies, researchers start by assuming a depen- The strength of the correlation between features is defined in dency structure (the user’s CP-nets) and then they try to learn the the second component of the BN, where there is a set of parameters users’ conditional preferences [9]. Bigot et al. [4] discussed the in the network that show the conditional probability distribution possibility of learning Probabilistic CP-nets (PCP-nets), which have between variables. been introduced in [10] in two settings (online and off-line). In Considering n variables V = {X 1 , ..., X n } in a BN, the joint dis- this paper, Bayesian networks are used to learn PCP-nets. In both tribution, that is shown by P (X 1 = x 1 , X 2 = x 2 , ..., X n = x n ) can settings they assume to have the dependency graph and then they be factorized as follows: ask multiple queries to the users to build up and learn the structure of the network. Similarly, Guerin et al. [15] present an algorithm for learning CP-net preferences by interacting with users rather than using users’ histories. Learning conditional preferences may P (X 1 , ..., X n ) = P (X 1 ) × P (X 2 |X 1 )... × P (X n |X 1 , ..., X n−1 ) be a tedious and costly task, even with acyclic CP-nets. However, Y = P (X i |X 1 , ..., X i−1 ) (1) the complexity of the problem can be reduced by interacting with i the users to simplify the learning procedure. For instance, Koriche et al. [19] propose an approach to identify a preference ordering In a BN, the value of each node X i ∈ V is dependent on the val- with binary domains, which uses membership queries. Similar work ues of its parents P (X 1 , X 2 , ..., X n ) = i P (X i |Pa(X i )), otherwise Q has been done in [12], where pairwise comparisons are used e.g., the node is defined as an independent node. For each node X i , there Figure 1: A CP-net and a Bayesian Network. is a conditional probability table that shows for each possible com- influence users on targeting a service/restaurant, feature selection bination of parents values the probability distribution over values process is required to enhance the effectiveness of the framework. of X i . In other words, in feature selection, we aim to nominate a sub- An example of BN is shown in Figure 1 (b), where the probability set Vs ⊂ V containing features that leverage the values of the that I target a Chinese x 13 or Mexican x 23 restaurant for a dinner target feature such that |Vs | = m with a given m. Thus, we use depends on the values of his parents, that are Price (X1) and Quality Information Gain1 , which is widely used in the state of art for of the restaurant (X1). It can be seen that the value of Quality of feature selection [21], to select the most important features to be the restaurant depends also on the Price (X1). The conditional prob- considered in the constructor space. This method selects which ability tables associated at each variable are shown in Figure 1 (b). feature/attribute represents the greatest information gain (more For example, the conditional probability table associated to variable details about the function can be found in [21]). Eventually, the list X 1 shows that the first value in the domain of X 1 has probability of features in Vs is sorted in decreasing order to build the following 0.2. Other conditional probability tables list the probability that list {X 0 , X 1s , . . . , Xms }. Then, the system attaches X t (as the target the child variable/takes one of the values in its domain for each node) at the end of the list as follows V = {X 0 , X 1s , . . . , Xms , X t } combinations of values in the domains of its parents. to achieve the desired list of features in feature selection process. Once the process is done, the system breaks down the sorted 3 TECHNICAL APPROACH features into three layers that are detailed in the following section. This section shows our approach to build a CP-net representing the conditional and qualitative preferences of a user starting from 3.2 Layers Extraction the history of the user. For the sake of clarification, we will explain Given the above list V = {X 0 , X 1s , . . . , Xms , X t } we aim to build how the constructor works in different sections by using examples an acyclic and directed graph that consists of three layers: Root layer, of preferences, which are taken from the domain of the restaurant Intermediate layer, and the Target layer. Hereafter, we describe in recommender system. detail each layer and links connecting the nodes inside the graph. Notice that the terms “node” and “feature” refer to the same concept 3.1 Feature Selection from now on. Within hundred numbers of attributes logged in the data-set sourced • Root Layer: this layer contains only the root node, which by users’ interactions at searching and selecting appropriate ser- is the most important feature among the others in the list. vices (e.g., selecting the desired restaurant in this context), the In other words, given the list “V ”, the first feature X 0 will features that are more important on influencing users’ preferences be considered as the root node. Since, this node X 0 is an should be selected. Thus, given a set of variables/features (from the independent feature, it does not have any income link from data set) V = {X 1, X 2, . . . , X n }, without the loss of generality we the other nodes. As an example, considering the restaurant assume that X n is the variable corresponding to the target node of selection domain, where the possible options for a user in se- the CP-net, which is the most constrained variable. lecting a restaurant depend mainly on his budget. In this case, In this context, target node points to the Type of restaurant/food, price of the restaurant could be the feature that corresponds that may have as values the types of restaurants that the user has to the root node. selected before and logged in his profile. We refer X n ≡ X t as a • Intermediate Layer: the main procedure of extracting users’ target variable. The values of the target variables are based on the conditional preferences on the basis of the strength of rela- set of remaining variables V \ {X n }. tions between features under certain conditions or threshold, Since every variable (among the other features that a user may 1 To use Information Gain function, we took the advantage of FSelector [20] library in consider to select a service/restaurant e.g., price, location, etc.) may R. will be executed in this layer. This layer contains all the nodes likelihood (NMC)), and Bayesian Score Function such as; Bayesian except Root and the Target nodes as follows {X 1s , . . . , Xms }. Dirichlet (BD), and K2 [8]. To set the internal links between intermediate nodes, we In general, Bayesian Score Functions try to find a network B need to measure the dependence between any pair of nodes with a maximum value SG with given data SG (B, D) [28]. In these (X is , X js ). The algorithm adds a link between the these functions, the procedure of constructing the network begins from a nodes that have dependence values higher or equal to a prior distribution and then computes the posterior probability dis- given threshold. This threshold value could be determined tribution. The network with maximum number is the best network automatically or manually. In this paper we decide to fix among the other possible networks. While Information Theory this threshold manually as described in Section 4. Assume Functions act based on compression [2]. They attempt to recognize that the determined dependencies between all the features the most compact model over the data D with less score value, in the Intermediate layer are in [0.0 - 1], thus we can set the therefore, a model with minimum score is the best model among threshold Tr=0.7. the others. • Target Layer: as the name implies, this layer indicates the To show the constructor and how the score functions work, we target node X t that shows the user’s preference (last attribute use Bayesian Information Criterion (BIC) [22] throughout this paper, in the sorted list) toward the specific domain. Thus, this layer but we implement the constructor with all the possible functions has only incoming links from the nodes, which are located (in the second layer) to evaluate the performance of the algorithm in the Intermediate layer, or in the Root layer. in the specific domain. The action flow is shown in Figure 2, where the selected features Bayesian Information Criterion [22] is the most popular score in the list V (Fig 2 (a) ) are segmented into three layers (Fig 2 (b) function among the others which is constructed based on likelihood ). Then, the proposed constructor integrates the layers (Fig 2 (c) ) function. BIC that is known “Schwarz criterion” model is a type of based upon the dependencies between the features in the second criterion model selection among the other models. However, the layer obtained by exploiting the score functions and the algorithm performance of the function highly differ from data-set to data-set, explained in the next Section 3.3. it is shown that BIC score function outperforms consistently the other score functions in constructing Bayesian network structure in different data-sets [22]. List of Features Root Layer X1 Root Layer X1 In this score function the lower value points the better fit the V ={x1,..,xn} wZ wl model, or the lower value represents the minimum information loss related to the candidate model which is shown in Equation 2. X1 X2 X3 X2 X3 Intermediat X2 X3 Intermediat Layer Layer X4 wk wj wi BIC = −2 ∗ ln(θ ) + ln(N ) ∗ k (2) X5 X6 X4 X5 where k refers to the degree of freedom to be estimated and X4 X5 ... ... Xn Target w= dependency N points the number of observations or sample size. The set of Target Layer X6 weight Layer X6 model parameters that maximize the likelihood function is shown a) Set of Features in b) Starting With Empty Graph by θ and ln(θ ) refers to the likelihood of the truth model. A more c) Final CP-net Graph the List V With 3 Layers lower BIC value indicates the obtained model is more likely to be considered as the best and true network model among the others. Figure 2: The CP-net Constructor Frontend. More details about how these functions work are presented in [25]. Hence, to perform the above functions to construct the desired network in the second layer, we borrow the structures shown in [1], where the various algorithms have been discussed such as Simulated 3.3 Feature Dependency Measurement and Annealing algorithm, Heuristic algorithm, Genetic algorithms. Constructing CP-net Taking into account the advantage of the greedy search and Due to the similarity between the concepts described above, we heuristic algorithm [13], we use Hill Climbing (HC) algorithm to exploit Bayesian network’s score functions to construct the main execute BIC, AIC, MIT, LL, BD, NMC, etc. functions to obtain the shape of users’ CP-nets in the second layer. structure of the users’ CP-nets in the second layer following the Many algorithms and techniques have been developed to tackle proposed structure shown in Algorithm 1. the problem of building a Bayesian Network, whose performance Recalling the feature selection and breaking the list “V” into vary according to the used score functions from data/domain to three layers, HC starts with an empty graph (G) in the second layer data/domain. Hence, we implement the framework by various kinds and attempts to find a model with high score (or minimum score) of score functions to decrease the miss classification results which by incrementally searching among the other possible models from lead to the suitable technique that accommodate with our data and its local neighbors: provide the highest performance. In other words, this is an optimization model that begins with an Score functions in Bayesian network lies into two main cate- arbitrary structure of the network, then try to find a better network gories of Information Theory Score such as Mutual information test by incrementally tuning the scores. Hence, if a new model with (MIT), Bayesian Information criterion (BIC), Akaike Information high/minimum score is found, it will be substituted with the old Criterion (AIC), Log Likelihood (LL), and Normalized Maximum model. Algorithm 1: These steps are repeated until no further model with high/minimum the structure of the network is obtained in the second layer, the score can be found. Although the algorithm has the problem of system converts the probability into preference as shown below. getting stuck in the local region that depends on the starting point, The procedure starts from the independent nodes. The highest we took the privilege of its high performance and accuracy to build probable value for a feature will be considered as the preferred the network. value among the other possible values. The independent nodes (as In the following section we show how connecting the layers to parents) influence the value of the remaining nodes (as children) on define the CP-net. basis of the probability table as shown in Figure 3. To better under- stand the conversion let us consider v = {X 2, X 3, X 4} v ⊂ V with 3.4 Converting Probability to Preferences 3 features and their binary domains D (X 2) = {x 12 , x 22 }, D(X 3) = This section show how to interpret the strength correlation between {x 13 , x 23 } and D (X 4) = {x 14 , x 24 }. As it is shown in Figure 3, X 2 influ- nodes to user’s preferences under the concept of CP-net. To this end, ences X 3 and X 4 in the second layer, and similarly X 4 influences the system determines the “maximum” probability of a feature’s X 3. Hence, X 3 is identified as the most dependent node among value as the user’s preference. It means if a node from the list “V ” the others with its probability table. The conditional probability e.g., Price contains two values in the domain such as cheap and table associated to X 3 can be used to derive preferences over values expensive, the system from the conditional probability table (in in D (X 2) in the CP-net. In the conditional preference table of the Bayesian Network) that is computed for each node in the list, picks CP-net associated to X 2 we have x 22 more preferred than x 12 since the value/domain (e.g., cheap) that has highest probability as the the probability of x 22 is 0.8 while the probability of x 12 is 0.2. Thus, user’s preference (see the table associated to X 2 in Figure 3). Once if x 22 is preferred for X 2 , then x 14 with probability 0.7 is the more preferred value for X 4. Similarly, the value of X 3 depends on the value X 2 and X 4 according to the probabilities. Here, we just gave Algorithm 1: CPnet Constructor Algorithm a simple example of features with binary domains but this could be Rnode ▷ root node extended with n numbers of values in the domains for each feature in the sorted list V . Inodes ▷ intermediate nodes Tnode ▷ target node Score G ▷ score G 3.5 Connecting the Layers ScoreNG ▷ new score G This section is in charge of integrating the Root and the Target Function start with empty graph G nodes to the nodes that are located in the Intermediate layer. Since, ScoreNG ← Score (G) the Root node dominates the other nodes in the two layers, the Initialize ScoreNG with G system generates a matrix of dependency between them. Practi- while Score (G) not minimize do cally, the aim is to find the strength relations between the Root provide acyclic operation G : and the rest, hence we use Chi-square, Information gain, Gain ra- add, delete, reverse tio and Symmetrical uncertainty functions [21] to obtained these if ScoreNG > Score (G) then dependencies. This process of generating the dependency matrix ScoreNG ← ScoreNG is broken down into two sections. First, is applied between the Root layer and “Intermediate and Target” layers (both together, as DepMatrix ← dependency(Rnode,Tnode, Inodes) the Root node dominates the rest of the nodes which are located if Depmatrix (Rnode,Tnode, Inodes) ≥ Tr then in these two layers). Secondly, it will be employed between the connectRnode → InodesandTnode Intermediate layer and the Target layer. Having at hand these two if DepMtx (Inodes,Tnode) then dependency matrix, we set a threshold CV = [0 − 1] as a Confidence connectInodes → Tnode Value to eliminate the links between these layers which can not meet the threshold value. This helps to find out the most impor- tant dependencies within the user CP-net that can characterize the user preferences. The described work-flow produces the final user’s CP-net that characterizes the user’ preferences. Function start with empty graph G To validate the correctness of the obtained CP-net, we carried ScoreNG ← Score (G) out the cross validation method on a real data-set that is detailed Initialize ScoreNG with G in the next section. while Score (G) not minimize do .. . 4 EXPERIMENTAL EVALUATION AND RESULTS The main task of recommender systems is to provide the best per- sonalized service for users in various domains. In this paper, we if ScoreNG > Score (G) then validate our procedure for constructing users’ CP-nets from their ScoreNG ← ScoreNG past behaviors in the restaurant recommendation domain. In the following, we describe in detail the experimental setup. X2 Probability Table Constructed Dependency Graph for variable X2 P (X2=x1) P(X2=x2) (0.2) < (0.8) CP-net X2 X2 X3 Probability Table 0.2 0.8 Transferring to x21 ≺ x22 X2 Preference (0.56)>(0.45) X2 X4 P (X3=x1) P(X3=x2) x1 ≺ x2 x21 x41 0.65 0.35 Dependency Graph for variable X4 x21 x42 0.7 0.3 X2 X4 Dependency Graph for variable X3 X4 Probability Table X4 with the binary domains x1 and x2 x1 ≻ x2 x22 x41 0.56 0.45 x21 x41 ≻ x42 X2 X4 X3 X2 P (X4=x1) P(X4=x2) x22 x42 0.20 0.80 (0.7)>(0.3) x22 x41≺ x42 x21 x41 x31≻ x32 x21 0.4 0.6 x21 x42 x31≺ x32 x22 0.7 0.3 x22 x41 x31 ≻ x32 X3 X4 x1 ≻ x2 X3 x22 x42 x31 ≻ x32 Figure 3: From a Bayesian network to the corresponding CP-net. 4.1 Data Collection “11”, and then we gradually reduce it to the top 4 important We exploited two real datasets of users’ preferences over the set features in the list. of restaurants in Mexico City, and Movie in Denmark and UK, • Dependency: {0.5, 0.6, 0.7}: Since we have defined the de- from UCL repository2 . The former is statistics on users’ restaurant pendencies between features in this interval [0-1], we tune selections. It contains 1116 samples where each sample carries the threshold similar to the number of features to find the 11 different features. Each sample indicates the selection of one best results. restaurant performed by a user. The latter is statistics on users’ • Direction: {0.5, 0.6, 0.7}: We also tuned the algorithm with movie ratings. Movie dataset includes 2297 instances with around three values to obtain the direction of the dependency be- 122 unique users where every sample holds 28 different features. tween the features. Due to the page limit, we only present features in the restaurant • Threshold (Tr) for connecting the first and the second layer: selection dataset in Table 1, which shows the description of the {0.03, 0.04, 0.05}. As mentioned above, we use different features and their abbreviations. functions to calculate the dependencies between layers, thus Then we group the samples that belong to the same user. Next, we may have different values of threshold. we exclude the users who have recorded less than 10 selections • Threshold (Tr) for connecting the second and the third layer: or ratings. This is due to the fact that a small number of samples {0.03, 0.04, 0.05} does not allow the system to efficiently learn and make inference. Thus, we select only 18 users with at least 10 recorded samples in Evaluation Setting for Dataset 2 (Movie): In contrast to dataset 1, the first dataset. While, the users whose ratings are logged in the the unique users in Movie dataset contains fairly enough instances. movie dataset have enough instances, so all the unique players are Thus, 10-fold cross validation is implemented to validate the sta- included. bility of the approach by feeding new data. In each round, 10% of data is left out to be considered as the test set and the rests –90%– 4.2 Evaluation & Model Selection are used to learn a CP-net using our proposed constructor and to train the model exploiting Bayesian Method. Similar settings and To achieve the best CP-net structure representing users’ preferences, parameters have set for number of selected features, dependency, we execute the algorithm with a range of parameters, where each direction and Thresholds. combination of parameter values specifies a specific structure of Once the model is trained, the algorithm computes a score e.g., CP-net. For each round, to find the best model to make inference, for each restaurant and movie (rating) option. This score shows the grid search technique is employed to find the optimal tuple of how probable is for the restaurant to be considered according the hyperparameter values. user’s interest. Then the system selects a restaurant which has the Evaluation Setting for Dataset 1 (Restaurant): Since we have a highest probability value as the user’s preference. limited number of samples for every user in Restaurant dataset, To evaluate the performance of the method we have used the the iterative leave-one-out cross validation is the best option to following well-known metrics: (i) Precision, that is the fraction of use for our multi-class classification problem. In each round, one retrieved instances that are relevant, (ii) Recall, that is the fraction sample is left out to be considered as the test set and the remaining of relevant instances that are retrieved, and (iii) F-measure that is a samples are used to learn a CP-net using our proposed method and weighted average of precision and recall [27]. to train the model using Bayesian Method. In the following, we The performance of each class is computed (for every user), and have listed the parameters and the set of the values that we have then an average over all classes performance is considered as the used to construct multiple CP-nets. performance of the method for the user. • Number of selected features: We have run the algorithm with various number of features to find out the right number that 4.3 Results can provide the best result. Thus we start from all features Table 2. presents our preliminary experimental results in the restau- rant recommendation domain. In particular, it shows the perfor- 2 https://archive.ics.uci.edu/ml/datasets.html mance of our approach by considering various score functions. Table 1: Data-set Information Result For Dataset 2: Analogous to the first evaluation, BIC func- tion works slightly better in all metrics such as precision with 0.51, # Features Restaurant Selection: Description and domain recall with 0.52, f-measure by 0.52 and ∼0.50 accuracy, w.r.t K2 and 1 Ub users budget (low, medium, high) Loglike on this users’ statistical data (Figure 4. shows a complex 2 Q quality of restaurant (low, medium and high) cp-net that is constructed by BIC with 8 features in r). However, 3 Al the restaurant serves alcohol or not (yes or not) the figures show a weak result for BIC function compared to the 4 Sm smoking area inside of the restaurant (yes or no) first assessment on Restaurant dataset. accessibility of the restaurant 5 Ac In contrast to the first experiment, both K2 and Loglike were (easy, medium or difficult) 6 Pr restaurant price (low, medium and high) correctly implemented by reporting around ∼0.49 in F-measure. 7 Ar outdoor restaurant or indoor restaurant These low figures might have obtained due to existing 5 classes 8 Os other services, like internet, etc (yes or not) in the target layer (target node has 5 different domains including the restaurant has parking area or not Very Bad, Bad, Medium, Good and Very Good which indicate the 9 Pk (yes or no) users’ rating on different movies), which make classification and restaurant rate which are given by user/ prediction arduous. Taking into account 20% accuracy as the base- 10 Ur people (low, medium and high) line for random classification, the obtained figures also show around 11 Tr type of restaurant (Italian, Latin-American, Asian) 30% above the baseline that is an admissible result. However further investigation is required both in terms of design and evaluation to increase and assess the performance of the constructor. Such a performance is measured in terms of “Precision, Recall, and F-measure”. Result For Dataset 1: As it is shown in Table 2, within the score functions implemented by the hill-climbing algorithm, only BIC, AIC and MDL were applicable in our data-set. The parameters, which are automatically set to generate such results vary from each other, since the functions use different strategy to calculate their scores in constructing BN. BIC has set by 6 number of features, 0.7 arc dependency and 0.03 dependency between layers, while AIC provides its best result by 7 number of features, 0.6 for arc dependency and 0.05 for layer dependency. The values of precision and recall using BIC and AIC are very similar with ∼0.70, however BIC performs slightly better both in terms of precision and recall. In contrast, MDL provides a pour results by 0.54, 0.57 and 0.555 in precision, recall and f-measure, respectively. It is noticeable that BIC, AIC and MDL with a smaller and greater Figure 4: A complex CP-net with 8 Features. numbers of features (reported in Table 2.) result mostly incomplete or/and a very complex and cycle graph. In addition, we have not inserted results for K2 and Loglike since the approach with these score functions often construct CP-nets with cycles, specially after 5 CONCLUSION appending the nodes of the layers in the final step. We have presented a system for automatically constructing CP-nets modeling users’ preferences from their past interaction with a ser- Table 2: The Performance of The Proposed Method on The vice provider. To construct the user CP-net we have first constructed Restaurant Data-set. a Bayesian network. We have exploited an heuristic algorithm Hill Climbing to execute various score functions, such as BIC, AIC, Log- lik, K2, MDL, in order to recognize the best graph model among Score # of Arc Dep Precision Recall F-measure Description all the possible models. Empirical results from restaurant selection Function Features Tr TR BIC 0.71 0.708 0.706 applicable 6 0.7 0.03 ad Movie domains have shown that this constructor may have a AIC 0.70 0.69 0.694 applicable 7 0.6 0.05 significant impact to enhance users’ satisfaction. Dataset 1 Loglik x x x not-applicable x x x This construction may be very useful when the recommender K2 x x x causing cycle x x x system must deal with context and the users’ preferences may MDL 0.54 0.57 0.555 applicable 4 0.5 0.05 change over time. We plan to integrate the proposed constructor in BIC 0.51 0.52 0.52 applicable 8 0.4 0.03 the Self-adaptive Context-Aware recommender system (SaCARS) AIC x x x causing cycle x x x presented in [18]. This integration allows SaCARS to completely Dataset 2 Loglik 0.54 0.43 0.485 applicable 8 0.5 0.03 learn and model the users’ conditional preferences from their be- K2 0.55 0.42 0.485 applicable 7 0.6 0.03 havior without human interference. MDL x x x not-applicable x x x REFERENCES [1] [n. d.]. Artificial intelligence: A modern approach author. ([n. d.]). [2] Hirotogu Akaike. 1998. Information theory and an extension of the maximum likelihood principle. In Selected Papers of Hirotugu Akaike. Springer, 199–213. [3] Xavier Amatriain and Justin Basilico. 2016. Past, Present, and Future of Rec- ommender Systems: An Industry Perspective. In Proceedings of the 10th ACM Conference on Recommender Systems (RecSys ’16). ACM, New York, NY, USA, 211–214. https://doi.org/10.1145/2959100.2959144 [4] Damien Bigot, Jérôme Mengin, and Bruno Zanuttini. 2014. Learning Probabilistic CP-nets from Observations of Optimal Items.. In STAIRS. 81–90. [5] Craig Boutilier, Ronen I. Brafman, Carmel Domshlak, Holger H. Hoos, and David Poole. 2004. CP-nets: A Tool for Representing and Reasoning with Conditional Ceteris Paribus Preference Statements. J. Artif. Int. Res. 21, 1 (Feb. 2004), 135–191. http://dl.acm.org/citation.cfm?id=1622467.1622473 [6] Yuriy Brun, Ron Desmarais, Kurt Geihs, Marin Litoiu, Antonia Lopes, Mary Shaw, and Michael Smit. 2013. A design space for self-adaptive systems. In Software Engineering for Self-Adaptive Systems II. Springer, 33–50. [7] Luis M de Campos. 2006. A scoring function for learning Bayesian networks based on mutual information and conditional independence tests. Journal of Machine Learning Research 7, Oct (2006), 2149–2187. [8] Alexandra M Carvalho. 2009. Scoring functions for learning Bayesian networks. Inesc-id Tec. Rep (2009). [9] Yann Chevaleyre, Frédéric Koriche, Jérôme Lang, Jérôme Mengin, and Bruno Zanuttini. 2010. Learning ordinal preferences on multiattribute domains: The case of CP-nets. In Preference learning. Springer, 273–296. [10] Cristina Cornelio, Judy Goldsmith, Nicholas Mattei, Francesca Rossi, and K Brent Venable. 2013. Dynamic and probabilistic cp-nets. mpref 2013 (2013). [11] Anind K. Dey. 2001. Understanding and Using Context. Personal Ubiquitous Comput. 5, 1 (Jan. 2001), 4–7. https://doi.org/10.1007/s007790170019 [12] Yannis Dimopoulos, Loizos Michael, and Fani Athienitou. 2009. Ceteris Paribus Preference Elicitation with Predictive Guarantees.. In IJCAI, Vol. 9. 1–6. [13] Thomas A Feo and Mauricio GC Resende. 1995. Greedy randomized adaptive search procedures. Journal of global optimization 6, 2 (1995), 109–133. [14] Yanjie Fu, Bin Liu, Yong Ge, Zijun Yao, and Hui Xiong. 2014. User preference learning with multiple information fusion for restaurant recommendation. In Proceedings of the 2014 SIAM International Conference on Data Mining. SIAM, 470–478. [15] Joshua T Guerin, Thomas E Allen, and Judy Goldsmith. 2013. Learning CP-net preferences online from user queries. In International Conference on Algorithmic DecisionTheory. Springer, 208–220. [16] Finn V Jensen. 1996. An introduction to Bayesian networks. Vol. 210. UCL press London. [17] Reza Khoshkangini, Maria Silvia Pini, and Francesca Rossi. 2016. A design of context-aware framework for conditional preferences of group of users. In Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing. Springer, 97–112. [18] Reza Khoshkangini, Maria Silvia Pini, and Francesca Rossi. 2016. A self-adaptive context-aware group recommender system. In Conference of the Italian Association for Artificial Intelligence. Springer, 250–265. [19] Frédéric Koriche and Bruno Zanuttini. 2010. Learning Conditional Preference Networks. Artif. Intell. 174, 11 (July 2010), 685–703. https://doi.org/10.1016/j. artint.2010.04.019 [20] Miron B Kursa, Witold R Rudnicki, et al. 2010. Feature selection with the Boruta package. J Stat Softw 36, 11 (2010), 1–13. [21] Changki Lee and Gary Geunbae Lee. 2006. Information gain and divergence-based feature selection for machine learning-based text categorization. Information processing & management 42, 1 (2006), 155–165. [22] Zhifa Liu, Brandon Malone, and Changhe Yuan. 2012. Empirical evaluation of scoring functions for Bayesian network model selection. BMC bioinformatics 13, 15 (2012), S14. [23] Jie Lu, Dianshuang Wu, Mingsong Mao, Wei Wang, and Guangquan Zhang. 2015. Recommender System Application Developments. Decis. Support Syst. 74, C (June 2015), 12–32. https://doi.org/10.1016/j.dss.2015.03.008 [24] Alberto Marchetti-Spaccamela, Andrea Vitaletti, Luca Becchetti, and Ugo Cole- santi. 2008. Self-Adaptive Recommendation Systems: Models and Experi- mental Analysis. 2008 Second IEEE International Conference on Self-Adaptive and Self-Organizing Systems (SASO) 00 (2008), 479–480. https://doi.org/doi. ieeecomputersociety.org/10.1109/SASO.2008.55 [25] Radford M Neal. 2012. Bayesian learning for neural networks. Vol. 118. Springer Science & Business Media. [26] Chihiro Ono, Mori Kurokawa, Yoichi Motomura, and Hideki Asoh. 2007. A Context-Aware Movie Preference Model Using a Bayesian Network for Rec- ommendation and Promotion. In Proceedings of the 11th International Confer- ence on User Modeling (UM ’07). Springer-Verlag, Berlin, Heidelberg, 247–257. https://doi.org/10.1007/978-3-540-73078-1_28 [27] David Martin Powers. 2011. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. (2011). [28] Marco Scutari. 2009. Learning Bayesian networks with the bnlearn R package. arXiv preprint arXiv:0908.3817 (2009).