Metric Learning for Value Alignment Andrea Loreggia1 , Nicholas Mattei2 , Francesca Rossi3 , K. Brent Venable2,4 1 University of Padova, Department of Mathematics, Padova, Italy 2 Tulane University, Department of Computer Science, New Orleans, LA, USA 3 IBM Research, IBM T.J. Watson Research Center, Yorktown Heights, NY, USA 4 Institute for Human and Machine Cognition (IHMC), Pensacola, FL, USA Abstract ment systems [Russell et al., 2015; Loreggia et al., 2018c; Loreggia et al., 2018b]. Preference are central to decision making by both Using a formal structure to model preferences, especially machines and humans. Representing, learning, and one that directly models dependency, can be useful for reason- reasoning with preferences is an important area of ing. For example, it can support reasoning based on inference study both within computer science and across the and causality, and provide more transparency and explainabil- social sciences. When we give our preferences to ity since the preferences are explicitly represented and hence an AI system we expect the system to make deci- scrutable [Kambhampati, 2019]. A number of compact pref- sions or recommendations that are consistent with erence representation languages have been developed in the our preferences but the decisions should also adhere literature for representing and reasoning with preferences; see to certain norms, guidelines, and ethical principles. the work of Amor et al. [2016] for a survey of compact graph- Hence, when working with preferences it is neces- ical models; we specifically focus on conditional preference sary to understand and compute a metric (distance) networks (CP-nets) [Boutilier et al., 2004]. between preferences – especially if we encode both CP-nets are a compact graphical model used to capture the user preferences and ethical systems in the same qualitative conditional preferences over features (variables) formalism. In this paper we investigate the use of [Boutilier et al., 2004]. Qualitative preferences are important CP-nets as a formalism for representing orderings as there is experimental evidence that qualitative preferences over actions for AI systems. We leverage a recently may more accurately reflect humans’ preferences in uncertain proposed metric for CP-nets and a neural network information settings [Popova et al., 2013; Allen et al., 2015]. architecture, CPM ETRIC, for computing this metric. CP-nets are a popular formalism for specifying preferences in Using these two tools we look at the how one can the literature and have been used for a number of applications build a fast and flexible value alignment system. including recommender systems and product specification [Pu et al., 2011; Fattah et al., 2018]. Consider a car that is 1 Introduction described by values for all its possible features: make, model, Preferences are central to individual and group decision mak- color, and stereo options. A CP-net consists of a dependency ing by both computer systems and humans. Due to this graph and a set of statements of the form, “all else being equal, central role in decision making the study of representing I prefer x to y.” For example, in a CP-net one could say “Given [Rossi et al., 2011], learning [Fürnkranz and Hüllermeier, that the car is a Honda Civic, I prefer red to yellow.”, the 2010], and reasoning [Domshlak et al., 2011] with prefer- condition sets the context for the preference over alternatives. ences is a focus of study within computer science and in These preferences are qualitative, i.e., there is no quantity many other disciplines including psychology and sociology expressing the degree of preference. [Goldsmith and Junker, 2009]. Individuals express their pref- A CP-net induces an ordering over all possible outcomes, erences in many different ways: pairwise comparisons, rank- i.e., all complete assignments to the set of features. This is a ings, approvals (likes), positive or negative examples, and partial order if the dependency graph of the CP-net is acyclic, many more examples are collected in various libraries and i.e., the conditionality of the statements does not create a cycle, databases [Mattei and Walsh, 2013; Mattei and Walsh, 2017; as is often assumed in work with CP-nets [Goldsmith et al., Bache and Lichman, 2013]. A core task in working with 2008]. The size of the description of the CP-net may be ex- preferences is understanding the relationship between pref- ponentially smaller than the partial order it describes. Hence, erences. This often takes the form of a dominance query, CP-nets are called a compact representation and reasoning and i.e., which item is more or most preferred, or distance mea- learning on the compact structure, instead of the full order, sures, i.e., which object is the closest to my stated preference. is an important topic of research. Recent work proposes the These types of reasoning are important in many domains in- first formal metric to describe the distance between CP-nets [Loreggia et al., 2018a] and the related formalism of LP-trees cluding recommender systems [Fattah et al., 2018], collec- [Li and Kazimipour, 2018] in a rigorous way. What is im- tive decision making [Brandt et al., 2016], and value align- portant is not the differences in the surface features of the Brandt et al., 2016]. CP-nets have even been used to com- CP-nets, e.g., a single statement or dependency, but rather the pose web services [Wang et al., 2009] and other decision aid distance between their induced partial orders. Even a small systems [Pu et al., 2011]. difference in a CP-net could generate a very different partial Formally, a CP-net has a set of features (or variables) order, depending on which feature is involved in the modifica- F = {X1 , . . . , Xn } with finite domains D(X∞ ), . . . , D(X\ ). tion. While the metrics proposed by Loreggia et al. [2018a] For each feature Xi , we are given a set of parent features are well grounded, they are computationally hard to compute, P a(Xi ) that can affect the preferences over the values of Xi . in general, and approximations must be used. This defines a dependency graph in which each node Xi has We envision the use of CP-nets to solve one part of the P a(Xi ) as its immediate predecessors. An acyclic CP-net is value alignment problem [Russell et al., 2015; Loreggia et one in which the dependency graph is acyclic. Given this struc- al., 2018c; Loreggia et al., 2018b]. This is part of a broader tural information, one needs to specify the preference over the research program we call Ethically Bounded AI, that seeks values of each variable Xi for each complete assignment to the to understand how to harness the power of AI yet prevent parent variables, P a(Xi ). This preference is assumed to take these systems from making choices we do not consider ethical the form of a total or partial order over D(Xi ). A cp-statement [Rossi and Mattei, 2019]. In this work we envision using a dis- for some feature Xi that has parents P a(Xi ) = {x1 , . . . , xn } tance metric to measure the difference between an individual and domain D(Xi ) = {a1 , . . . , am } is a total ordering over agent’s preferences over actions and another ordering given by, D(Xi ) and has general form: x1 = v1 , x2 = v2 , . . . , xn = e.g., ethics, norms, or business values [Loreggia et al., 2018c; vn : a1  . . .  am , where for each Xi ∈ P a(X1 ) : xi = vi Loreggia et al., 2018b]. is an assignment to a parent of Xi with vi ∈ D(Xi ). The set Following work in metric learning over structured repre- of cp-statements regarding a certain variable Xi is called the sentations [Bellet et al., 2015], we wish to learn the distance cp-table for Xi . between partial orders represented compactly as CP-nets. We Consider the CP-net depicted graphically in Figure 1 (left) do not want to work with the partial orders directly as they with features are A, B, C, and D. Figure 1 (right) gives the may be exponentially larger than the CP-net representation. In- full induced preference order for the CP-net. Each variable formally, given two CP-nets, we wish to estimate the distance has binary domain containing f and f if F is the name of the between their induced partial orders using a neural network. feature. All cp-statements in the CP-net are: a  a, b  b, The number of possible CP-nets grows extremely fast, from (a ∧ b) : c  c, (a ∧ b) : c  c, (a ∧ b) : c  c, (a ∧ b) : c  c, 481,776 for 4 binary features to over 5.24 × 1040 with 7 bi- c : d  d, c : d  d. Here, statement a  a represents nary features [Allen et al., 2017]. However, the computation the unconditional preference for A = a over A = a, while time of the approximation algorithm proposed by Loreggia et al. [2018a] scales linearly with the number of features, hence, statement c : d  d states that D = d is preferred to D = d, new methods must be explored. Therefore, leveraging the given that C = c. The semantics of CP-nets depends on the inferential properties of neural networks may help us make notion of a worsening flip: a change in the value of a variable CP-nets more useful as a preference reasoning formalism. to a less preferred value according to the cp-statement for that variable. For example, in the CP-net above, passing from abcd Contributions. We propose using metric learning as a tool to abcd is a worsening flip since c is better than c given a to practically solve aspects of the value alignment problem. We and b. One outcome α is preferred to or dominates another propose to model both user preferences and ethical priorities outcome β (written α  β) if and only if there is a chain of over actions using the CP-net formalism and we demonstrate worsening flips from α to β. This definition induces a preorder how one can use a state of the art neural network formula- over the outcomes, which is a partial order if the CP-net is tion, CPM ETRIC, to quickly and accurately judge distances acyclic [Boutilier et al., 2004], as depicted in Figure 1 (right). between preferred actions and ethical actions. We evaluate The complexity of dominance and consistency testing in our models and metrics on generated CP-nets and show that CP-nets is an area of active study in preference reasoning CPM ETRIC leads to a speed up in computation while still [Goldsmith et al., 2008; Rossi et al., 2011]. Finding the opti- being accurate. mal outcome of a CP-net is NP-hard [Boutilier et al., 2004] in general but can be found in polynomial time for acyclic CP- 2 CP-nets nets by assigning the most preferred value for each cp-table. Conditional Preference networks (CP-nets) are a graphical Indeed, acyclic CP-nets induce a lattice over the outcomes as model for compactly representing conditional and qualita- (partially) depicted in Figure 1 (right). The induced preference tive preference relations [Boutilier et al., 2004]. CP-nets are ordering, Figure 1 (right), can be exponentially larger than the comprised of sets of ceteris paribus preference statements CP-net Figure 1 (left), which motivates learning a metric using (cp-statements). For instance, the cp-statement, “I prefer red only the (more compact) CP-net. wine to white wine if meat is served," asserts that, given two meals that differ only in the kind of wine served and both 3 Preferences and Ethical Priorities containing meat, the meal with red wine is preferable to the In what follow we assume that we can describe a user’s behav- meal with white wine. CP-nets have been extensively used ior through her preferences over the features of the domain. in the preference reasoning preference learning and social This can be modeled as a CP-net and it gives us an ordering choice literature as a formalism for working with qualita- over the actions the user would like to take. The example given tive preferences [Domshlak et al., 2011; Rossi et al., 2011; in Loreggia et al. [2018c] concerns a driver of a vehicle: they Most Preferred abcd abcd abcd abcd abcd abcd abcd abcd abcd abcd abcd a > ā A C D c d > d¯ b > b̄ B c̄ d¯ > d abcd abcd abcd abcd (a ∧ b) c > c̄ (ā ∧ b̄) c > c̄ Least Preferred (a ∧ b̄) c̄ > c (ā ∧ b) c̄ > c abcd Figure 1: A CP-net with n = 4 features (left) and part of in the induced partial order (right). Note that the partial order is over all 2n = 16 possible combinations and arrows denote the dominance relation. We have arranged the nodes so that each is one flip between the levels. may want to go as fast as possible and run over certain small dependent in how they handle and combine these two sources animals. However, there may be some overall ethical or moral of information for decision making [Rossi and Loreggia, 2019; guidelines (priorities) the system must follow. In this case we Rossi and Mattei, 2019]. want to evaluate the difference, or distance, between the indi- The combined use of deep learning techniques and logi- vidual user and the society. This idea of morality as ordering cal reasoning formalisms is an exciting research direction to over actions was first proposed by Sen [1974]. find a principled ways to develop AI systems that are both ac- To operationalize this system we wish to describe both pref- countable and able to explain themselves. We hope that these erences and priorities using the CP-net formalism and using a approaches will be able to overcome limitations of the “black- notion of distance in the metric space of CP-nets. This enables box paradigm” in the machine learning discipline [Rossi and us and the system to understand whether users’ preferences Loreggia, 2019]. are close enough to the moral principles or not. Eventually, when preferences deviate from the desired behavior, we can 4 Metric Learning on CP-nets use CP-nets, since they induce an ordering, to find a trade- Metric learning algorithms aim to learn a metric (or distance off so that the quality of the decision with respect to the function) over a set of training points or samples [Sohn, 2016]. subjective preferences does not significantly degrade when The importance of metrics has grown in recent years with the conforming to the ethical principles [Loreggia et al., 2018c; use of these functions in many different domains: from clus- Loreggia et al., 2018b]. In this way we have bounded the be- tering to information retrieval and from recommender systems havior of the system to be ethical while still being responsive to preference aggregation. For instance, many clustering algo- to the user preferences [Rossi and Mattei, 2019]. rithms like the k-Means or classification algorithm including Traditional reasoning and learning approaches in AI provide k-Nearest Neighbor use a distance value between points. different and complementary capabilities to an autonomous Formally, a metric space is a pair (M, d) where M is a set agent. Symbolic and logical reasoning allow these agents to of elements and d is a function d : M × M → R where d manipulate symbols and perform inference, while machine satisfies four criteria. Given any three elements A, B, C ∈ M , learning techniques can learn and optimize many ill-defined d must satisfy problems from large amounts of data. As ongoing work, we 1. (1) d(A, B) ≥ 0, there must be a value for all pairs; intend to study the use of both kinds of approaches to model, learn, and reason with both preferences and ethical princi- 2. (2) d(A, B) = d(B, A), d must be symmetric; ples. Inspired by the System 1 and System 2 theory of Daniel 3. (3) d(A, B) ≤ d(A, C) + d(C, B); d must satisfy the Kahneman [Kahneman, 2011], we will define a dual-agent triangle inequality; and architecture that will provide autonomous agents with the abil- 4. (4) d(A, B) = 0 if and only if A = B; d can be zero if ity to combine symbolic and accurate reasoning with data and only if the two elements are the same. interpretation and learning, for both preferences and ethical Xing et al. [2002] first formalized the problem of metric principles. This will allow machines to be flexible and context- learning, i.e., learning the metric directly from samples rather than formally specifying the function d. This approach requires cp-statement. We briefly define the representations used for training data, meaning that we have some oracle that is able CPM ETRIC here, for a complete overview, see Loreggia et to give the value of the metric for each pair. The success of al. [2019]. deep learning in many different domains [Krizhevsky et al., We represent a CP-net I over m using two matrices. First is 2012] has lead many researchers to apply these approaches to the adjacency matrix adjI which represents the dependency the field of metric learning, resulting in a number of important graph of the CP-net and is a m × m matrix of 0s and 1s. The results [Bellet et al., 2015]. second matrix represents the list of cp-statements cptI , which In this work we focus on metric spaces (M , d) where M is a m × 2m−1 matrix, where each row represents a variable is a set of CP-nets. Given this, we want to learn the distance Xi ∈ F and each column represents a complete assignment d which best approximates the Kendall tau distance (KTD) for each of the variables in F \ Xi . The list is built following [Kendall, 1938] between the induced partial orders. Informally, a topological ordering of variables in the CP-net. Each cell the Kendall tau distance between two orderings is the number cptI (i, j) stores the preference value for the ith variable given of pairs that are discordant, i.e., not ordered in the same way, the jth assignment to variables in F \ Xi . in both orderings. This distance metric extended to partial The set of training examples X = {x1 , . . . , xn } is made orders (Definition 1) was shown to be a metric on the space of up of pairs of CP-nets represented through their normalized CP-nets by Loreggia et al. [2018a]. To extend the classic KTD Laplacians and the cp-statements. The set of corresponding to CP-nets, a penalty parameter p defined for partial rankings labels Y = {y1 , . . . , yn }T , where each yi ∈ Y, yi ∈ [0, 1] [Fagin et al., 2006] was extended to the case of partial orders. is the normalized value of KTD between the CP-nets in xi . Loreggia et al. [2018a] assume that all CP-nets are acyclic Each xi ∈ X is then a tuple (LA , cptA , LB , cptB ) represent- and in minimal (non-degenerate) form, i.e., all arcs in the ing a pair of CP-net (A, B) by their Laplacian, LA , and the dependency graph have a real dependency expressed in the encoding of their cp-statements, cptA . cp-statements, a standard assumption in the CP-net literature (see e.g., [Allen et al., 2017; Boutilier et al., 2004]). 6 Experiment Definition 1. Given two CP-nets A and B inducing partial CPM ETRIC is trained to learn the KTD metric by varying the orders P and Q over the same unordered set of outcomes number of features of the CP-nets n ∈ {3, . . . , 7} and using P p U : KT D(A, B) = KT (P, Q) = ∀i,j∈U,i6=j Ki,j (P, Q) two different autoencoders. For a complete discussion of the where i and j are two outcomes with i 6= j (i.e., iterate over performance of CPM ETRIC on the standard classification and all unique pairs), we have: regression tasks in comparison with I-CPD, including a dis- p 1. Ki,j (P, Q) = 0 if i, j are ordered in the same way or are cussion of tuning hyperparameters see Loreggia et al. [2019]. incomparable in P and Q; In this paper we focus on what we call the comparison task p that is necessary for value alignment systems. Informally, this 2. Ki,j (P, Q) = 1 if i, j are ordered inversely in P and Q; p 3. Ki,j (P, Q) = p, 0.5 ≤ p < 1 if i, j are ordered in P is the task where given two CP-nets, say one representing the and incomparable in Q (resp. Q, P ). preferences of the user and one representing a set of ethical values or constraints, we want to decide if some third CP-net, To make this distance scale invariant, i.e., a value in [0, 1], say the course of action to be taken, is closer to the preferences it is divided by |U |. or closer to the constraints. CP-nets present two important challenges when used for metric learning. The first is that we are attempting to learn a 6.1 Data Generation and Training metric via a compact representation of a partial order. We are For each number of features n ∈ {3, . . . , 7} we generate 1000 not learning over the partial orders induced by the CP-nets CP-nets uniformly at random using the generators from Allen directly, as they could be exponentially larger than the CP-nets. et al. [Allen et al., 2017]. This set of CP-nets is split into a The second challenge is the encoding of the graphical struc- training-generative-set (900 CP-nets) and test-generative-set ture itself. Graph learning with neural networks is still a active (100 CP-nets) 10 different ways to give us 10 fold cross vali- and open area of research; Goyal and Ferrara [2017] give a dation. For each fold we compute the training and test dataset comprised of all, e.g., 900  complete survey of recent work as well as a Python library of 2 , possible pairs of CP-nets from implementations for many of these techniques. Most of these the training-generative-set and test-generative-set, respectively, works focus on finding good embeddings for the nodes of the along with the value of KTD for that pair. While we generate network and then using collections of these learned embed- the CP-nets themselves uniformly at random observe that this dings to represent the graph for, e.g., particular segmentation creates an unbalanced set of distances – it induces a normal or link prediction tasks. None of these techniques have been distribution – and hence our sets are unbalanced. Figure 2 applied to embedding graphs for metric learning. shows the distribution of of CP-net pairs over 20 intervals for all CP-nets generated for n ∈ {3, . . . , 7}. While our classifica- 5 Structure of CPM ETRIC tion experiments are for m = 10 classes, dividing the interval into 20 classes provides a better visualization of the challenge In our task the metric space is (M, d) where M is a set of of obtaining training samples at the edges of the distribution. compact, graphical preferences that induce a partial order We ran a preliminary experiment on balancing our dataset and our goal is to learn the metric d only from the compact, by sub-sampling the training and test datasets. In these graphical representation. The key challenge is the need to small experiments, performance was much worse than per- find a vector representation of not only the graph but the Distribution of CP-net Pairs per Classes The training and validation loss for the autoencoder is shown in Figure 3. Observe that the loss for the CPT rep- 140000 resentation approaches zero after only 3 epochs for both the 120000 training and validation phases. The same trend is true for the Number of CP-net Pairs 100000 adjacency matrix, though the loss converges to ≈ 0.15. 80000 6.2 Comparison Task Performance 60000 For many applications we are not concerned with the true 40000 value of the distance but rather deciding which of two prefer- 20000 ences is closer to a reference point. For example, in product recommendation we may want to display the closer of two 0 0 1 2 3 4 5 6 7 8 9 objects and not care about computing the values [Pu et al., 10 11 12 13 14 15 16 17 18 19 20 Intervals 2011]. Formally, the qualitative comparison takes takes a set of CP-nets triples (A, B, C), where A is a reference CP-net and Figure 2: Histogram of the number of CP-net pairs per in- the task is to decide which other CP-net B or C is closer to terval across all experimental datasets. CP-nets pairs are not A. We generate uniformly at random 1000 triples of CP-nets distributed uniformly in the class intervals. for each n ∈ {3, . . . , 7}. For each triple (A, B, C) we com- pute both KTD(A, B) and KTD(A, C) to establish ground formance on the unbalanced dataset. Because we are learn- truth and use our regression networks to predict the distance ing a metric, for each CP-net A, there is only one CP-net between (A, B) and (A, C). B such KT D(A, B) = 1 and only one CP-net C such Table 1 displays the accuracy, as a percentage out of 1000 KT D(A, C) = 0. Consequently, attempting to balance or trials, of our three CPM ETRIC architectures versus I-CPD for hold out CP-nets from test or train can lead to poor perfor- this task; Table 2 gives the average runtime per pair, averaged mance. We conjecture that in order to improve this task we over all 1000 trials. The standard deviations in Table 1 are should perform some kind of data augmentation, but this across the 10 folds of the training/test set. For all of our net- would introduce more subjective assumptions on how and works we obtain an accuracy above 85% and all the networks where data should be augmented [Wong et al., 2016]. perform about the same on this task (±2.0%) and the trend for All training was done on a machine with 2 x Intel(R) accuracy is flat across the size of the CP-nets. It is interesting Xeon(R) CPU E5-2670 @ 2.60GHz and one NVidia K20 to note that on this task neither of the autoencoders were able 128GB GPU. We train CPM ETRIC for 70 epochs using the to significantly improve performance as they did for the quan- Adam optimizer [Kingma and Ba, 2014]. For each number of titative comparison tasks. While the results are inconclusive, as all instances of CPM ETRIC performed about the same, it features of the CP-net n we use all 900 pairs in the training- 2 will be interesting to see if there are autoencoder architectures set. There are only 488 binary CP-nets with 3 features [Allen that are more suited to the comparison task. et al., 2017], hence, for n = 3 the training-set is 17K samples A positive take away is that, as Table 2 shows, we achieve while for n > 3 the number of samples in the training-set a sub-linear increase in inference time for our model. I-CPD is 800K. Both the Autoencoder and Siamese Autoencoder scales linearly with the description size of the CP-net so the models are trained for 100 epochs using the Adam optimizer [Kingma and Ba, 2014] using the same training-set. Model neural network does, after training, offer the ability to, in a practical amount of time, compare CP-nets of larger sizes. weights from the best performing epoch are saved and subse- This gives us hope that while the metric itself is NP-hard to quently transferred to the deep neural network used to learn compute in a direct way, we can use the power of deep learning the distance function. to enable systems that could be practically deployed. Unfortunately the trend for accuracy is negative when the number of features increases. However, this is also the case for the I-CPD approximation and both metrics seem to be losing accuracy at about the same rate. The loss in accuracy for the neural network models could be caused by the unbalanced nature of the training and testing datasets. Again, as shown in Figure 2, generating the CP-nets themselves uniformly at random does not give us a uniform distribution over distances and correcting this may give better performance. Our conjecture for the slight disadvantage for CPM ETRIC over I-CPD has to do with the directionality of the errors. When training the network we are optimizing for accuracy on the regression task. However, when using this metric it does Figure 3: Performance of the autoencoder on the validation not matter if CPM ETRIC overestimates by a small amount and training set for 10 epochs. or underestimates by some small amount. However, when looking at the comparison task, it may matter a lot if the direction of our errors is random. An important direction for No Autoencoder Autoencoder Siam. Autoencoder I-CPD N Accuracy on Triples Accuracy on Triples Accuracy on Triples Accuracy on Triples 3 85.01% (2.01%) 85.76% ( 2.29%) 85.47% (2.32%) 91.80% 4 91.17% (0.92%) 91.38% (1.10%) 91.78% (1.13%) 92.90% 5 88.40% (0.91%) 89.36% (1.08%) 89.18% (1.08%) 90.80% 6 87.33% (0.80%) 87.17% (1.33%) 86.79% (1.84%) 90.10% 7 84.79% (1.16%) 84.57% (1.14%) 85.12% (0.86%) 89.90% Table 1: Performance of the various network architectures on the qualitative comparison task as well as performance of I-CPD. While our networks do not achieve the best performance on this task they are competitive with the more costly approximation algorithm I-CPD. N I-CPD Autoencoder Neural Network theory and data in preference modeling: Bringing humans into the loop. In Proc. 4th ADT, 2015. 3 0.69 (0.48) msec 0.087 (0.004) msec 4 1.09 (0.33) msec 0.098 (0.004) msec [Allen et al., 2017] T. E. Allen, J. Goldsmith, H. E. Justice, N. Mat- 5 1.85 (0.49) msec 0.100 (0.005) msec tei, and K. Raines. Uniform random generation and dominance testing for cp-nets. JAIR, 59:771–813, 2017. 6 3.16 (0.74) msec 0.114 (0.003) msec 7 4.65 (0.86) msec 0.138 (0.001) msec [Amor et al., 2016] N. B. Amor, D. Dubois, H. Gouider, and H. Prade. Graphical models for preference representation: An overview. In Proceedings of the 10th International Scalable Un- Table 2: Comparison of the mean runtime for a single triple certainty Management (SUM 2016), pages 96–111, 2016. over 1000 trials on the qualitative comparison task of the neural network and I-CPD. [Bache and Lichman, 2013] K. Bache and M. Lichman. UCI Ma- chine Learning Repository, 2013. University of California, Irvine, School of Information and Computer Sciences. future work is to try different optimization objectives when [Bellet et al., 2015] Aurélien Bellet, Amaury Habrard, and Marc training the network to see if this bias is the reason for the Sebban. Metric Learning. Synthesis Lectures on Artificial Intel- underperformance. ligence and Machine Learning. Morgan & Claypool Publishers, 2015. 7 Conclusion [Boutilier et al., 2004] C. Boutilier, R. Brafman, C. Domshlak, H.H. Hoos, and D. Poole. CP-nets: A tool for representing and reason- In this paper we have discussed how to use CPM ETRIC, a ing with conditional ceteris paribus preference statements. Journal novel neural network model to learn a metric (distance) func- of Artificial Intelligence Research, 21:135–191, 2004. tion between partial orders induced from a CP-net, a compact, [Brandt et al., 2016] F. Brandt, V. Conitzer, U. Endriss, J. Lang, and structured preference representation, to enable practical value A. D. Procaccia, editors. Handbook of Computational Social alignment systems. To our knowledge this is the first use of Choice. Cambridge University Press, 2016. neural networks to learn and measure preferences for the value alignment problem. We feel that this is an interesting and [Cornelio et al., 2013] C. Cornelio, J. Goldsmith, N. Mattei, F. Rossi, and K.B. Venable. Updates and uncertainty in CP-nets. fruitful direction for research in the AI Safety domain as we In Proc. 26th AUSAI, 2013. must develop practical and efficient tools that can be used to effectively harness the power of AI systems. [Cornelio et al., 2015] C. Cornelio, U. Grandi, J. Goldsmith, N. Mat- Important directions for future work include integrating tei, F. Rossi, and K.B. Venable. Reasoning with PCP-nets in a multi-agent context. In Proc. 14th AAMAS, 2015. novel graph learning techniques to our networks and extend- ing our work to other formalisms including, e.g., PCP-nets [Domshlak et al., 2011] C. Domshlak, E. Hüllermeier, S. Kaci, and [Cornelio et al., 2013] and LP-trees [Li and Kazimipour, 2018]. H. Prade. Preferences in AI: An overview. AI, 175(7):1037–1052, PCP-nets are a particularly interesting direction as they have 2011. been proposed as an efficient way to model uncertainty over [Fagin et al., 2006] Ronald Fagin, Ravi Kumar, Mohammad Mah- the preferences of a single or multiple agents [Cornelio et al., dian, D. Sivakumar, and Erik Vee. Comparing partial rankings. 2015]. Another important extension involves setting contexts SIAM J. Discret. Math., 20(3):628–648, March 2006. for different preference and ethical priority encodings. CP- [Fattah et al., 2018] Sheik Mohammad Mostakim Fattah, Athman nets and many other preference formalisms model a particular Bouguettaya, and Sajib Mistry. A CP-Net based qualitative com- domain but do not give us any insight into when we may need position approach for an IaaS provider. In International Confer- to pass between one context and another. ence on Web Information Systems Engineering, pages 151–166. Springer, 2018. References [Fürnkranz and Hüllermeier, 2010] J. Fürnkranz and E. Hüllermeier. Preference Learning. Springer, 2010. [Allen et al., 2015] T. E. Allen, M. Chen, J. Goldsmith, N. Mattei, A. Popova, M. Regenwetter, F. Rossi, and C. Zwilling. Beyond [Goldsmith and Junker, 2009] J. Goldsmith and U. Junker. Prefer- ence handling for artificial intelligence. AI Magazine, 29(4), 2009. [Goldsmith et al., 2008] J. Goldsmith, J. Lang, M. Truszczyński, [Rossi et al., 2011] F. Rossi, K.B. Venable, and T. Walsh. A Short and N. Wilson. The computational complexity of dominance and Introduction to Preferences: Between Artificial Intelligence and consistency in CP-nets. Journal of Artificial Intelligence Research, Social Choice. Morgan and Claypool, 2011. 33(1):403–432, 2008. [Russell et al., 2015] Stuart Russell, Daniel Dewey, and Max [Goyal and Ferrara, 2017] Palash Goyal and Emilio Ferrara. Graph Tegmark. Research priorities for robust and beneficial artificial embedding techniques, applications, and performance: A survey. intelligence. AI Magazine, 36(4):105–114, 2015. CoRR, abs/1705.02801, 2017. [Sen, 1974] A. Sen. Choice, Ordering, and Morality. Blackwell, [Kahneman, 2011] Daniel Kahneman. Thinking, fast and slow. Far- 1974. rar, Straus and Giroux, New York, 2011. [Sohn, 2016] K. Sohn. Improved deep metric learning with multi- [Kambhampati, 2019] S. Kambhampati. Synthesizing explainable class n-pair loss objective. In Advances in Neural Information behavior for human-ai collaboration. In Proc. 18th AAMAS, 2019. Processing Systems (NeruIPS), pages 1857–1865, 2016. [Kendall, 1938] M. G. Kendall. A new measure of rank correlation. [Wang et al., 2009] Hongbing Wang, Shizhi Shao, Xuan Zhou, Biometrika, 30(1/2):81–93, 1938. Cheng Wan, and Athman Bouguettaya. Web service selection [Kingma and Ba, 2014] D. P. Kingma and J. Ba. Adam: A method with incomplete or inconsistent user preferences. In Proc. 7th for stochastic optimization. arXiv/1412.6980, 2014. International Conference on Service-Oriented Computing, pages 83–98. Springer, 2009. [Krizhevsky et al., 2012] A. Krizhevsky, I. Sutskever, and G. E. Hin- [Wong et al., 2016] S. C. Wong, A. Gatt, V. Stamatescu, and M. D ton. Imagenet classification with deep convolutional neural net- works. In Proc. Advances in Neural and Information Processing McDonnell. Understanding data augmentation for classification: Systems (NeurIPS), pages 1097–1105, 2012. When to warp? In Proc. of the 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA), [Li and Kazimipour, 2018] Minyi Li and Borhan Kazimipour. An pages 1–6, 2016. efficient algorithm to compute distance between lexicographic [Xing et al., 2002] E. P. Xing, A. Y. Ng, M. I. Jordan, and S. J. preference trees. In Proc. 27th IJCAI, pages 1898–1904, 2018. Russell. Distance metric learning with application to clustering [Loreggia et al., 2018a] A. Loreggia, N. Mattei, F. Rossi, and K. B. with side-information. In Proc. 15th NeurIPS, pages 505–512, Venable. On the distance between CP-nets. In Proc. 17th AAMAS, 2002. 2018. [Loreggia et al., 2018b] A. Loreggia, N. Mattei, F. Rossi, and K. B. Venable. Preferences and ethical principles in decision making. In Proceedings of the 1st AAAI/ACM Conference on AI, Ethics, and Society (AIES), 2018. [Loreggia et al., 2018c] A. Loreggia, N. Mattei, F. Rossi, and K. B. Venable. Value alignment via tractable preference distance. In R. V. Yampolskiy, editor, Artificial Intelligence Safety and Security, chapter 18. CRC Press, 2018. [Loreggia et al., 2019] A. Loreggia, N. Mattei, F. Rossi, and K. B. Venable. CPM ETRIC: Deep siamese networks for learn- ing distances between structured preferences. arXiv preprint arXiv:1809.08350, 2019. [Mattei and Walsh, 2013] N. Mattei and T. Walsh. P REF L IB: A library for preferences, HTTP :// WWW. PREFLIB . ORG. In Proc. 3rd ADT, 2013. [Mattei and Walsh, 2017] N. Mattei and T. Walsh. A P REF L IB .O RG Retrospective: Lessons Learned and New Directions. In U. En- driss, editor, Trends in Computational Social Choice, chapter 15, pages 289–309. AI Access Foundation, 2017. [Popova et al., 2013] A. Popova, M. Regenwetter, and N. Mattei. A behavioral perspective on social choice. AMAI, 68(1–3):135–160, 2013. [Pu et al., 2011] P. Pu, B. Faltings, L. Chen, J. Zhang, and P. Viap- piani. Usability guidelines for product recommenders based on example critiquing research. In F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor, editors, Recommender Systems Handbook, pages 511–545. Springer, 2011. [Rossi and Loreggia, 2019] F. Rossi and A. Loreggia. Preferences and ethical priorities: Thinking fast and slow in AI. In Proc. 18th AAMAS, pages 3–4, 2019. [Rossi and Mattei, 2019] Francesca Rossi and Nicholas Mattei. Building ethically bounded AI. In Proc. 33rd AAAI, 2019.