Metric Learning for Value Alignment

                 Andrea Loreggia1 , Nicholas Mattei2 , Francesca Rossi3 , K. Brent Venable2,4
                        1
                           University of Padova, Department of Mathematics, Padova, Italy
                 2
                   Tulane University, Department of Computer Science, New Orleans, LA, USA
               3
                  IBM Research, IBM T.J. Watson Research Center, Yorktown Heights, NY, USA
                   4
                     Institute for Human and Machine Cognition (IHMC), Pensacola, FL, USA


                          Abstract                                  ment systems [Russell et al., 2015; Loreggia et al., 2018c;
                                                                    Loreggia et al., 2018b].
     Preference are central to decision making by both                 Using a formal structure to model preferences, especially
     machines and humans. Representing, learning, and               one that directly models dependency, can be useful for reason-
     reasoning with preferences is an important area of             ing. For example, it can support reasoning based on inference
     study both within computer science and across the              and causality, and provide more transparency and explainabil-
     social sciences. When we give our preferences to               ity since the preferences are explicitly represented and hence
     an AI system we expect the system to make deci-                scrutable [Kambhampati, 2019]. A number of compact pref-
     sions or recommendations that are consistent with              erence representation languages have been developed in the
     our preferences but the decisions should also adhere           literature for representing and reasoning with preferences; see
     to certain norms, guidelines, and ethical principles.          the work of Amor et al. [2016] for a survey of compact graph-
     Hence, when working with preferences it is neces-              ical models; we specifically focus on conditional preference
     sary to understand and compute a metric (distance)             networks (CP-nets) [Boutilier et al., 2004].
     between preferences – especially if we encode both                CP-nets are a compact graphical model used to capture
     the user preferences and ethical systems in the same           qualitative conditional preferences over features (variables)
     formalism. In this paper we investigate the use of             [Boutilier et al., 2004]. Qualitative preferences are important
     CP-nets as a formalism for representing orderings              as there is experimental evidence that qualitative preferences
     over actions for AI systems. We leverage a recently            may more accurately reflect humans’ preferences in uncertain
     proposed metric for CP-nets and a neural network               information settings [Popova et al., 2013; Allen et al., 2015].
     architecture, CPM ETRIC, for computing this metric.            CP-nets are a popular formalism for specifying preferences in
     Using these two tools we look at the how one can               the literature and have been used for a number of applications
     build a fast and flexible value alignment system.              including recommender systems and product specification
                                                                    [Pu et al., 2011; Fattah et al., 2018]. Consider a car that is
1   Introduction                                                    described by values for all its possible features: make, model,
Preferences are central to individual and group decision mak-       color, and stereo options. A CP-net consists of a dependency
ing by both computer systems and humans. Due to this                graph and a set of statements of the form, “all else being equal,
central role in decision making the study of representing           I prefer x to y.” For example, in a CP-net one could say “Given
[Rossi et al., 2011], learning [Fürnkranz and Hüllermeier,          that the car is a Honda Civic, I prefer red to yellow.”, the
2010], and reasoning [Domshlak et al., 2011] with prefer-           condition sets the context for the preference over alternatives.
ences is a focus of study within computer science and in            These preferences are qualitative, i.e., there is no quantity
many other disciplines including psychology and sociology           expressing the degree of preference.
[Goldsmith and Junker, 2009]. Individuals express their pref-          A CP-net induces an ordering over all possible outcomes,
erences in many different ways: pairwise comparisons, rank-         i.e., all complete assignments to the set of features. This is a
ings, approvals (likes), positive or negative examples, and         partial order if the dependency graph of the CP-net is acyclic,
many more examples are collected in various libraries and           i.e., the conditionality of the statements does not create a cycle,
databases [Mattei and Walsh, 2013; Mattei and Walsh, 2017;          as is often assumed in work with CP-nets [Goldsmith et al.,
Bache and Lichman, 2013]. A core task in working with               2008]. The size of the description of the CP-net may be ex-
preferences is understanding the relationship between pref-         ponentially smaller than the partial order it describes. Hence,
erences. This often takes the form of a dominance query,            CP-nets are called a compact representation and reasoning and
i.e., which item is more or most preferred, or distance mea-        learning on the compact structure, instead of the full order,
sures, i.e., which object is the closest to my stated preference.   is an important topic of research. Recent work proposes the
These types of reasoning are important in many domains in-          first formal metric to describe the distance between CP-nets
                                                                    [Loreggia et al., 2018a] and the related formalism of LP-trees
cluding recommender systems [Fattah et al., 2018], collec-
                                                                    [Li and Kazimipour, 2018] in a rigorous way. What is im-
tive decision making [Brandt et al., 2016], and value align-
portant is not the differences in the surface features of the      Brandt et al., 2016]. CP-nets have even been used to com-
CP-nets, e.g., a single statement or dependency, but rather the    pose web services [Wang et al., 2009] and other decision aid
distance between their induced partial orders. Even a small        systems [Pu et al., 2011].
difference in a CP-net could generate a very different partial        Formally, a CP-net has a set of features (or variables)
order, depending on which feature is involved in the modifica-     F = {X1 , . . . , Xn } with finite domains D(X∞ ), . . . , D(X\ ).
tion. While the metrics proposed by Loreggia et al. [2018a]        For each feature Xi , we are given a set of parent features
are well grounded, they are computationally hard to compute,       P a(Xi ) that can affect the preferences over the values of Xi .
in general, and approximations must be used.                       This defines a dependency graph in which each node Xi has
   We envision the use of CP-nets to solve one part of the         P a(Xi ) as its immediate predecessors. An acyclic CP-net is
value alignment problem [Russell et al., 2015; Loreggia et         one in which the dependency graph is acyclic. Given this struc-
al., 2018c; Loreggia et al., 2018b]. This is part of a broader     tural information, one needs to specify the preference over the
research program we call Ethically Bounded AI, that seeks          values of each variable Xi for each complete assignment to the
to understand how to harness the power of AI yet prevent           parent variables, P a(Xi ). This preference is assumed to take
these systems from making choices we do not consider ethical       the form of a total or partial order over D(Xi ). A cp-statement
[Rossi and Mattei, 2019]. In this work we envision using a dis-    for some feature Xi that has parents P a(Xi ) = {x1 , . . . , xn }
tance metric to measure the difference between an individual       and domain D(Xi ) = {a1 , . . . , am } is a total ordering over
agent’s preferences over actions and another ordering given by,    D(Xi ) and has general form: x1 = v1 , x2 = v2 , . . . , xn =
e.g., ethics, norms, or business values [Loreggia et al., 2018c;   vn : a1  . . .  am , where for each Xi ∈ P a(X1 ) : xi = vi
Loreggia et al., 2018b].                                           is an assignment to a parent of Xi with vi ∈ D(Xi ). The set
   Following work in metric learning over structured repre-        of cp-statements regarding a certain variable Xi is called the
sentations [Bellet et al., 2015], we wish to learn the distance    cp-table for Xi .
between partial orders represented compactly as CP-nets. We           Consider the CP-net depicted graphically in Figure 1 (left)
do not want to work with the partial orders directly as they       with features are A, B, C, and D. Figure 1 (right) gives the
may be exponentially larger than the CP-net representation. In-    full induced preference order for the CP-net. Each variable
formally, given two CP-nets, we wish to estimate the distance      has binary domain containing f and f if F is the name of the
between their induced partial orders using a neural network.       feature. All cp-statements in the CP-net are: a  a, b  b,
The number of possible CP-nets grows extremely fast, from          (a ∧ b) : c  c, (a ∧ b) : c  c, (a ∧ b) : c  c, (a ∧ b) : c  c,
481,776 for 4 binary features to over 5.24 × 1040 with 7 bi-
                                                                   c : d  d, c : d  d. Here, statement a  a represents
nary features [Allen et al., 2017]. However, the computation
                                                                   the unconditional preference for A = a over A = a, while
time of the approximation algorithm proposed by Loreggia et
al. [2018a] scales linearly with the number of features, hence,    statement c : d  d states that D = d is preferred to D = d,
new methods must be explored. Therefore, leveraging the            given that C = c. The semantics of CP-nets depends on the
inferential properties of neural networks may help us make         notion of a worsening flip: a change in the value of a variable
CP-nets more useful as a preference reasoning formalism.           to a less preferred value according to the cp-statement for that
                                                                   variable. For example, in the CP-net above, passing from abcd
Contributions. We propose using metric learning as a tool          to abcd is a worsening flip since c is better than c given a
to practically solve aspects of the value alignment problem. We    and b. One outcome α is preferred to or dominates another
propose to model both user preferences and ethical priorities      outcome β (written α  β) if and only if there is a chain of
over actions using the CP-net formalism and we demonstrate         worsening flips from α to β. This definition induces a preorder
how one can use a state of the art neural network formula-         over the outcomes, which is a partial order if the CP-net is
tion, CPM ETRIC, to quickly and accurately judge distances         acyclic [Boutilier et al., 2004], as depicted in Figure 1 (right).
between preferred actions and ethical actions. We evaluate            The complexity of dominance and consistency testing in
our models and metrics on generated CP-nets and show that          CP-nets is an area of active study in preference reasoning
CPM ETRIC leads to a speed up in computation while still           [Goldsmith et al., 2008; Rossi et al., 2011]. Finding the opti-
being accurate.                                                    mal outcome of a CP-net is NP-hard [Boutilier et al., 2004] in
                                                                   general but can be found in polynomial time for acyclic CP-
2   CP-nets                                                        nets by assigning the most preferred value for each cp-table.
Conditional Preference networks (CP-nets) are a graphical          Indeed, acyclic CP-nets induce a lattice over the outcomes as
model for compactly representing conditional and qualita-          (partially) depicted in Figure 1 (right). The induced preference
tive preference relations [Boutilier et al., 2004]. CP-nets are    ordering, Figure 1 (right), can be exponentially larger than the
comprised of sets of ceteris paribus preference statements         CP-net Figure 1 (left), which motivates learning a metric using
(cp-statements). For instance, the cp-statement, “I prefer red     only the (more compact) CP-net.
wine to white wine if meat is served," asserts that, given two
meals that differ only in the kind of wine served and both         3    Preferences and Ethical Priorities
containing meat, the meal with red wine is preferable to the       In what follow we assume that we can describe a user’s behav-
meal with white wine. CP-nets have been extensively used           ior through her preferences over the features of the domain.
in the preference reasoning preference learning and social         This can be modeled as a CP-net and it gives us an ordering
choice literature as a formalism for working with qualita-         over the actions the user would like to take. The example given
tive preferences [Domshlak et al., 2011; Rossi et al., 2011;       in Loreggia et al. [2018c] concerns a driver of a vehicle: they
                                                                                                             Most Preferred

                                                                                                          abcd


                                                                              abcd          abcd                        abcd          abcd


                                                                              abcd   abcd          abcd          abcd          abcd   abcd


       a > ā A

                       C                   D   c    d > d¯
       b > b̄ B                                c̄   d¯ > d                    abcd          abcd                        abcd          abcd
                      (a ∧ b)     c > c̄
                      (ā ∧ b̄)   c > c̄
                                                                   Least Preferred
                      (a ∧ b̄)    c̄ > c
                      (ā ∧ b)    c̄ > c

                                                                                                          abcd


Figure 1: A CP-net with n = 4 features (left) and part of in the induced partial order (right). Note that the partial order is over all
2n = 16 possible combinations and arrows denote the dominance relation. We have arranged the nodes so that each is one flip
between the levels.


may want to go as fast as possible and run over certain small          dependent in how they handle and combine these two sources
animals. However, there may be some overall ethical or moral           of information for decision making [Rossi and Loreggia, 2019;
guidelines (priorities) the system must follow. In this case we        Rossi and Mattei, 2019].
want to evaluate the difference, or distance, between the indi-           The combined use of deep learning techniques and logi-
vidual user and the society. This idea of morality as ordering         cal reasoning formalisms is an exciting research direction to
over actions was first proposed by Sen [1974].                         find a principled ways to develop AI systems that are both ac-
   To operationalize this system we wish to describe both pref-        countable and able to explain themselves. We hope that these
erences and priorities using the CP-net formalism and using a          approaches will be able to overcome limitations of the “black-
notion of distance in the metric space of CP-nets. This enables        box paradigm” in the machine learning discipline [Rossi and
us and the system to understand whether users’ preferences             Loreggia, 2019].
are close enough to the moral principles or not. Eventually,
when preferences deviate from the desired behavior, we can             4      Metric Learning on CP-nets
use CP-nets, since they induce an ordering, to find a trade-           Metric learning algorithms aim to learn a metric (or distance
off so that the quality of the decision with respect to the            function) over a set of training points or samples [Sohn, 2016].
subjective preferences does not significantly degrade when             The importance of metrics has grown in recent years with the
conforming to the ethical principles [Loreggia et al., 2018c;          use of these functions in many different domains: from clus-
Loreggia et al., 2018b]. In this way we have bounded the be-           tering to information retrieval and from recommender systems
havior of the system to be ethical while still being responsive        to preference aggregation. For instance, many clustering algo-
to the user preferences [Rossi and Mattei, 2019].                      rithms like the k-Means or classification algorithm including
   Traditional reasoning and learning approaches in AI provide         k-Nearest Neighbor use a distance value between points.
different and complementary capabilities to an autonomous                 Formally, a metric space is a pair (M, d) where M is a set
agent. Symbolic and logical reasoning allow these agents to            of elements and d is a function d : M × M → R where d
manipulate symbols and perform inference, while machine                satisfies four criteria. Given any three elements A, B, C ∈ M ,
learning techniques can learn and optimize many ill-defined            d must satisfy
problems from large amounts of data. As ongoing work, we
                                                                         1. (1) d(A, B) ≥ 0, there must be a value for all pairs;
intend to study the use of both kinds of approaches to model,
learn, and reason with both preferences and ethical princi-              2. (2) d(A, B) = d(B, A), d must be symmetric;
ples. Inspired by the System 1 and System 2 theory of Daniel             3. (3) d(A, B) ≤ d(A, C) + d(C, B); d must satisfy the
Kahneman [Kahneman, 2011], we will define a dual-agent                       triangle inequality; and
architecture that will provide autonomous agents with the abil-          4. (4) d(A, B) = 0 if and only if A = B; d can be zero if
ity to combine symbolic and accurate reasoning with data                     and only if the two elements are the same.
interpretation and learning, for both preferences and ethical             Xing et al. [2002] first formalized the problem of metric
principles. This will allow machines to be flexible and context-       learning, i.e., learning the metric directly from samples rather
than formally specifying the function d. This approach requires       cp-statement. We briefly define the representations used for
training data, meaning that we have some oracle that is able          CPM ETRIC here, for a complete overview, see Loreggia et
to give the value of the metric for each pair. The success of         al. [2019].
deep learning in many different domains [Krizhevsky et al.,              We represent a CP-net I over m using two matrices. First is
2012] has lead many researchers to apply these approaches to          the adjacency matrix adjI which represents the dependency
the field of metric learning, resulting in a number of important      graph of the CP-net and is a m × m matrix of 0s and 1s. The
results [Bellet et al., 2015].                                        second matrix represents the list of cp-statements cptI , which
   In this work we focus on metric spaces (M , d) where M             is a m × 2m−1 matrix, where each row represents a variable
is a set of CP-nets. Given this, we want to learn the distance        Xi ∈ F and each column represents a complete assignment
d which best approximates the Kendall tau distance (KTD)              for each of the variables in F \ Xi . The list is built following
[Kendall, 1938] between the induced partial orders. Informally,       a topological ordering of variables in the CP-net. Each cell
the Kendall tau distance between two orderings is the number          cptI (i, j) stores the preference value for the ith variable given
of pairs that are discordant, i.e., not ordered in the same way,      the jth assignment to variables in F \ Xi .
in both orderings. This distance metric extended to partial              The set of training examples X = {x1 , . . . , xn } is made
orders (Definition 1) was shown to be a metric on the space of        up of pairs of CP-nets represented through their normalized
CP-nets by Loreggia et al. [2018a]. To extend the classic KTD         Laplacians and the cp-statements. The set of corresponding
to CP-nets, a penalty parameter p defined for partial rankings        labels Y = {y1 , . . . , yn }T , where each yi ∈ Y, yi ∈ [0, 1]
[Fagin et al., 2006] was extended to the case of partial orders.      is the normalized value of KTD between the CP-nets in xi .
Loreggia et al. [2018a] assume that all CP-nets are acyclic           Each xi ∈ X is then a tuple (LA , cptA , LB , cptB ) represent-
and in minimal (non-degenerate) form, i.e., all arcs in the           ing a pair of CP-net (A, B) by their Laplacian, LA , and the
dependency graph have a real dependency expressed in the              encoding of their cp-statements, cptA .
cp-statements, a standard assumption in the CP-net literature
(see e.g., [Allen et al., 2017; Boutilier et al., 2004]).             6     Experiment
Definition 1. Given two CP-nets A and B inducing partial              CPM ETRIC is trained to learn the KTD metric by varying the
orders P and Q over the same unordered      set of outcomes           number of features of the CP-nets n ∈ {3, . . . , 7} and using
                                   P                  p
U : KT D(A, B) = KT (P, Q) =           ∀i,j∈U,i6=j Ki,j (P, Q)        two different autoencoders. For a complete discussion of the
where i and j are two outcomes with i 6= j (i.e., iterate over        performance of CPM ETRIC on the standard classification and
all unique pairs), we have:                                           regression tasks in comparison with I-CPD, including a dis-
         p
    1. Ki,j (P, Q) = 0 if i, j are ordered in the same way or are     cussion of tuning hyperparameters see Loreggia et al. [2019].
       incomparable in P and Q;                                       In this paper we focus on what we call the comparison task
         p                                                            that is necessary for value alignment systems. Informally, this
    2. Ki,j (P, Q) = 1 if i, j are ordered inversely in P and Q;
         p
    3. Ki,j (P, Q) = p, 0.5 ≤ p < 1 if i, j are ordered in P          is the task where given two CP-nets, say one representing the
       and incomparable in Q (resp. Q, P ).                           preferences of the user and one representing a set of ethical
                                                                      values or constraints, we want to decide if some third CP-net,
    To make this distance scale invariant, i.e., a value in [0, 1],   say the course of action to be taken, is closer to the preferences
it is divided by |U |.                                                or closer to the constraints.
   CP-nets present two important challenges when used for
metric learning. The first is that we are attempting to learn a       6.1    Data Generation and Training
metric via a compact representation of a partial order. We are        For each number of features n ∈ {3, . . . , 7} we generate 1000
not learning over the partial orders induced by the CP-nets           CP-nets uniformly at random using the generators from Allen
directly, as they could be exponentially larger than the CP-nets.     et al. [Allen et al., 2017]. This set of CP-nets is split into a
The second challenge is the encoding of the graphical struc-          training-generative-set (900 CP-nets) and test-generative-set
ture itself. Graph learning with neural networks is still a active    (100 CP-nets) 10 different ways to give us 10 fold cross vali-
and open area of research; Goyal and Ferrara [2017] give a            dation. For each fold we compute     the training and test dataset
                                                                      comprised of all, e.g., 900
                                                                                                   
complete survey of recent work as well as a Python library of                                    2   , possible  pairs of CP-nets from
implementations for many of these techniques. Most of these           the training-generative-set and test-generative-set, respectively,
works focus on finding good embeddings for the nodes of the           along with the value of KTD for that pair. While we generate
network and then using collections of these learned embed-            the CP-nets themselves uniformly at random observe that this
dings to represent the graph for, e.g., particular segmentation       creates an unbalanced set of distances – it induces a normal
or link prediction tasks. None of these techniques have been          distribution – and hence our sets are unbalanced. Figure 2
applied to embedding graphs for metric learning.                      shows the distribution of of CP-net pairs over 20 intervals for
                                                                      all CP-nets generated for n ∈ {3, . . . , 7}. While our classifica-
5     Structure of CPM ETRIC                                          tion experiments are for m = 10 classes, dividing the interval
                                                                      into 20 classes provides a better visualization of the challenge
In our task the metric space is (M, d) where M is a set of
                                                                      of obtaining training samples at the edges of the distribution.
compact, graphical preferences that induce a partial order
                                                                         We ran a preliminary experiment on balancing our dataset
and our goal is to learn the metric d only from the compact,
                                                                      by sub-sampling the training and test datasets. In these
graphical representation. The key challenge is the need to
                                                                      small experiments, performance was much worse than per-
find a vector representation of not only the graph but the
                                  Distribution of CP-net Pairs per Classes                                                           The training and validation loss for the autoencoder is
                                                                                                                                  shown in Figure 3. Observe that the loss for the CPT rep-
                         140000
                                                                                                                                  resentation approaches zero after only 3 epochs for both the
                         120000
                                                                                                                                  training and validation phases. The same trend is true for the
Number of CP-net Pairs


                         100000                                                                                                   adjacency matrix, though the loss converges to ≈ 0.15.
                         80000
                                                                                                                                  6.2   Comparison Task Performance
                         60000
                                                                                                                                  For many applications we are not concerned with the true
                         40000                                                                                                    value of the distance but rather deciding which of two prefer-
                         20000
                                                                                                                                  ences is closer to a reference point. For example, in product
                                                                                                                                  recommendation we may want to display the closer of two
                             0    0   1   2   3   4   5   6   7   8    9                                                          objects and not care about computing the values [Pu et al.,
                                                                           10   11   12   13   14   15   16   17   18   19   20

                                                                      Intervals                                                   2011]. Formally, the qualitative comparison takes takes a set of
                                                                                                                                  CP-nets triples (A, B, C), where A is a reference CP-net and
 Figure 2: Histogram of the number of CP-net pairs per in-                                                                        the task is to decide which other CP-net B or C is closer to
 terval across all experimental datasets. CP-nets pairs are not                                                                   A. We generate uniformly at random 1000 triples of CP-nets
 distributed uniformly in the class intervals.                                                                                    for each n ∈ {3, . . . , 7}. For each triple (A, B, C) we com-
                                                                                                                                  pute both KTD(A, B) and KTD(A, C) to establish ground
formance on the unbalanced dataset. Because we are learn-                                                                         truth and use our regression networks to predict the distance
ing a metric, for each CP-net A, there is only one CP-net                                                                         between (A, B) and (A, C).
B such KT D(A, B) = 1 and only one CP-net C such                                                                                     Table 1 displays the accuracy, as a percentage out of 1000
KT D(A, C) = 0. Consequently, attempting to balance or                                                                            trials, of our three CPM ETRIC architectures versus I-CPD for
hold out CP-nets from test or train can lead to poor perfor-                                                                      this task; Table 2 gives the average runtime per pair, averaged
mance. We conjecture that in order to improve this task we                                                                        over all 1000 trials. The standard deviations in Table 1 are
should perform some kind of data augmentation, but this                                                                           across the 10 folds of the training/test set. For all of our net-
would introduce more subjective assumptions on how and                                                                            works we obtain an accuracy above 85% and all the networks
where data should be augmented [Wong et al., 2016].                                                                               perform about the same on this task (±2.0%) and the trend for
   All training was done on a machine with 2 x Intel(R)                                                                           accuracy is flat across the size of the CP-nets. It is interesting
Xeon(R) CPU E5-2670 @ 2.60GHz and one NVidia K20                                                                                  to note that on this task neither of the autoencoders were able
128GB GPU. We train CPM ETRIC for 70 epochs using the                                                                             to significantly improve performance as they did for the quan-
Adam optimizer [Kingma and Ba, 2014]. For each number of                                                                         titative comparison tasks. While the results are inconclusive,
                                                                                                                                  as all instances of CPM ETRIC performed about the same, it
features of the CP-net n we use all 900     pairs in the training-
                                        2                                                                                         will be interesting to see if there are autoencoder architectures
set. There are only 488 binary CP-nets with 3 features [Allen
                                                                                                                                  that are more suited to the comparison task.
et al., 2017], hence, for n = 3 the training-set is 17K samples
                                                                                                                                     A positive take away is that, as Table 2 shows, we achieve
while for n > 3 the number of samples in the training-set
                                                                                                                                  a sub-linear increase in inference time for our model. I-CPD
is 800K. Both the Autoencoder and Siamese Autoencoder
                                                                                                                                  scales linearly with the description size of the CP-net so the
models are trained for 100 epochs using the Adam optimizer
[Kingma and Ba, 2014] using the same training-set. Model                                                                          neural network does, after training, offer the ability to, in a
                                                                                                                                  practical amount of time, compare CP-nets of larger sizes.
weights from the best performing epoch are saved and subse-
                                                                                                                                  This gives us hope that while the metric itself is NP-hard to
quently transferred to the deep neural network used to learn
                                                                                                                                  compute in a direct way, we can use the power of deep learning
the distance function.
                                                                                                                                  to enable systems that could be practically deployed.
                                                                                                                                     Unfortunately the trend for accuracy is negative when the
                                                                                                                                  number of features increases. However, this is also the case for
                                                                                                                                  the I-CPD approximation and both metrics seem to be losing
                                                                                                                                  accuracy at about the same rate. The loss in accuracy for the
                                                                                                                                  neural network models could be caused by the unbalanced
                                                                                                                                  nature of the training and testing datasets. Again, as shown
                                                                                                                                  in Figure 2, generating the CP-nets themselves uniformly at
                                                                                                                                  random does not give us a uniform distribution over distances
                                                                                                                                  and correcting this may give better performance.
                                                                                                                                     Our conjecture for the slight disadvantage for CPM ETRIC
                                                                                                                                  over I-CPD has to do with the directionality of the errors.
                                                                                                                                  When training the network we are optimizing for accuracy on
                                                                                                                                  the regression task. However, when using this metric it does
 Figure 3: Performance of the autoencoder on the validation                                                                       not matter if CPM ETRIC overestimates by a small amount
 and training set for 10 epochs.                                                                                                  or underestimates by some small amount. However, when
                                                                                                                                  looking at the comparison task, it may matter a lot if the
                                                                                                                                  direction of our errors is random. An important direction for
                        No Autoencoder             Autoencoder          Siam. Autoencoder                 I-CPD
                 N    Accuracy on Triples      Accuracy on Triples     Accuracy on Triples        Accuracy on Triples
                 3      85.01% (2.01%)           85.76% ( 2.29%)          85.47% (2.32%)                  91.80%
                 4      91.17% (0.92%)           91.38% (1.10%)           91.78% (1.13%)                  92.90%
                 5      88.40% (0.91%)           89.36% (1.08%)           89.18% (1.08%)                  90.80%
                 6      87.33% (0.80%)           87.17% (1.33%)           86.79% (1.84%)                  90.10%
                 7      84.79% (1.16%)           84.57% (1.14%)           85.12% (0.86%)                  89.90%

Table 1: Performance of the various network architectures on the qualitative comparison task as well as performance of I-CPD.
While our networks do not achieve the best performance on this task they are competitive with the more costly approximation
algorithm I-CPD.


    N         I-CPD           Autoencoder Neural Network                 theory and data in preference modeling: Bringing humans into the
                                                                         loop. In Proc. 4th ADT, 2015.
    3    0.69 (0.48) msec          0.087 (0.004) msec
    4    1.09 (0.33) msec          0.098 (0.004) msec                 [Allen et al., 2017] T. E. Allen, J. Goldsmith, H. E. Justice, N. Mat-
    5    1.85 (0.49) msec          0.100 (0.005) msec                    tei, and K. Raines. Uniform random generation and dominance
                                                                         testing for cp-nets. JAIR, 59:771–813, 2017.
    6    3.16 (0.74) msec          0.114 (0.003) msec
    7    4.65 (0.86) msec          0.138 (0.001) msec                 [Amor et al., 2016] N. B. Amor, D. Dubois, H. Gouider, and
                                                                         H. Prade. Graphical models for preference representation: An
                                                                         overview. In Proceedings of the 10th International Scalable Un-
Table 2: Comparison of the mean runtime for a single triple              certainty Management (SUM 2016), pages 96–111, 2016.
over 1000 trials on the qualitative comparison task of the
neural network and I-CPD.                                             [Bache and Lichman, 2013] K. Bache and M. Lichman. UCI Ma-
                                                                         chine Learning Repository, 2013. University of California, Irvine,
                                                                         School of Information and Computer Sciences.
future work is to try different optimization objectives when          [Bellet et al., 2015] Aurélien Bellet, Amaury Habrard, and Marc
training the network to see if this bias is the reason for the           Sebban. Metric Learning. Synthesis Lectures on Artificial Intel-
underperformance.                                                        ligence and Machine Learning. Morgan & Claypool Publishers,
                                                                         2015.
7   Conclusion                                                        [Boutilier et al., 2004] C. Boutilier, R. Brafman, C. Domshlak, H.H.
                                                                         Hoos, and D. Poole. CP-nets: A tool for representing and reason-
In this paper we have discussed how to use CPM ETRIC, a                  ing with conditional ceteris paribus preference statements. Journal
novel neural network model to learn a metric (distance) func-            of Artificial Intelligence Research, 21:135–191, 2004.
tion between partial orders induced from a CP-net, a compact,
                                                                      [Brandt et al., 2016] F. Brandt, V. Conitzer, U. Endriss, J. Lang, and
structured preference representation, to enable practical value
                                                                         A. D. Procaccia, editors. Handbook of Computational Social
alignment systems. To our knowledge this is the first use of             Choice. Cambridge University Press, 2016.
neural networks to learn and measure preferences for the value
alignment problem. We feel that this is an interesting and            [Cornelio et al., 2013] C. Cornelio, J. Goldsmith, N. Mattei,
                                                                         F. Rossi, and K.B. Venable. Updates and uncertainty in CP-nets.
fruitful direction for research in the AI Safety domain as we
                                                                         In Proc. 26th AUSAI, 2013.
must develop practical and efficient tools that can be used to
effectively harness the power of AI systems.                          [Cornelio et al., 2015] C. Cornelio, U. Grandi, J. Goldsmith, N. Mat-
   Important directions for future work include integrating              tei, F. Rossi, and K.B. Venable. Reasoning with PCP-nets in a
                                                                         multi-agent context. In Proc. 14th AAMAS, 2015.
novel graph learning techniques to our networks and extend-
ing our work to other formalisms including, e.g., PCP-nets            [Domshlak et al., 2011] C. Domshlak, E. Hüllermeier, S. Kaci, and
[Cornelio et al., 2013] and LP-trees [Li and Kazimipour, 2018].          H. Prade. Preferences in AI: An overview. AI, 175(7):1037–1052,
PCP-nets are a particularly interesting direction as they have           2011.
been proposed as an efficient way to model uncertainty over           [Fagin et al., 2006] Ronald Fagin, Ravi Kumar, Mohammad Mah-
the preferences of a single or multiple agents [Cornelio et al.,         dian, D. Sivakumar, and Erik Vee. Comparing partial rankings.
2015]. Another important extension involves setting contexts             SIAM J. Discret. Math., 20(3):628–648, March 2006.
for different preference and ethical priority encodings. CP-          [Fattah et al., 2018] Sheik Mohammad Mostakim Fattah, Athman
nets and many other preference formalisms model a particular             Bouguettaya, and Sajib Mistry. A CP-Net based qualitative com-
domain but do not give us any insight into when we may need              position approach for an IaaS provider. In International Confer-
to pass between one context and another.                                 ence on Web Information Systems Engineering, pages 151–166.
                                                                         Springer, 2018.
References                                                            [Fürnkranz and Hüllermeier, 2010] J. Fürnkranz and E. Hüllermeier.
                                                                         Preference Learning. Springer, 2010.
[Allen et al., 2015] T. E. Allen, M. Chen, J. Goldsmith, N. Mattei,
  A. Popova, M. Regenwetter, F. Rossi, and C. Zwilling. Beyond        [Goldsmith and Junker, 2009] J. Goldsmith and U. Junker. Prefer-
                                                                         ence handling for artificial intelligence. AI Magazine, 29(4), 2009.
[Goldsmith et al., 2008] J. Goldsmith, J. Lang, M. Truszczyński,         [Rossi et al., 2011] F. Rossi, K.B. Venable, and T. Walsh. A Short
   and N. Wilson. The computational complexity of dominance and              Introduction to Preferences: Between Artificial Intelligence and
   consistency in CP-nets. Journal of Artificial Intelligence Research,      Social Choice. Morgan and Claypool, 2011.
   33(1):403–432, 2008.                                                   [Russell et al., 2015] Stuart Russell, Daniel Dewey, and Max
[Goyal and Ferrara, 2017] Palash Goyal and Emilio Ferrara. Graph             Tegmark. Research priorities for robust and beneficial artificial
   embedding techniques, applications, and performance: A survey.            intelligence. AI Magazine, 36(4):105–114, 2015.
   CoRR, abs/1705.02801, 2017.                                            [Sen, 1974] A. Sen. Choice, Ordering, and Morality. Blackwell,
[Kahneman, 2011] Daniel Kahneman. Thinking, fast and slow. Far-              1974.
   rar, Straus and Giroux, New York, 2011.                                [Sohn, 2016] K. Sohn. Improved deep metric learning with multi-
[Kambhampati, 2019] S. Kambhampati. Synthesizing explainable                 class n-pair loss objective. In Advances in Neural Information
   behavior for human-ai collaboration. In Proc. 18th AAMAS, 2019.           Processing Systems (NeruIPS), pages 1857–1865, 2016.
[Kendall, 1938] M. G. Kendall. A new measure of rank correlation.         [Wang et al., 2009] Hongbing Wang, Shizhi Shao, Xuan Zhou,
   Biometrika, 30(1/2):81–93, 1938.                                          Cheng Wan, and Athman Bouguettaya. Web service selection
[Kingma and Ba, 2014] D. P. Kingma and J. Ba. Adam: A method                 with incomplete or inconsistent user preferences. In Proc. 7th
   for stochastic optimization. arXiv/1412.6980, 2014.                       International Conference on Service-Oriented Computing, pages
                                                                             83–98. Springer, 2009.
[Krizhevsky et al., 2012] A. Krizhevsky, I. Sutskever, and G. E. Hin-
                                                                          [Wong et al., 2016] S. C. Wong, A. Gatt, V. Stamatescu, and M. D
   ton. Imagenet classification with deep convolutional neural net-
   works. In Proc. Advances in Neural and Information Processing             McDonnell. Understanding data augmentation for classification:
   Systems (NeurIPS), pages 1097–1105, 2012.                                 When to warp? In Proc. of the 2016 International Conference on
                                                                             Digital Image Computing: Techniques and Applications (DICTA),
[Li and Kazimipour, 2018] Minyi Li and Borhan Kazimipour. An                 pages 1–6, 2016.
   efficient algorithm to compute distance between lexicographic
                                                                          [Xing et al., 2002] E. P. Xing, A. Y. Ng, M. I. Jordan, and S. J.
   preference trees. In Proc. 27th IJCAI, pages 1898–1904, 2018.
                                                                             Russell. Distance metric learning with application to clustering
[Loreggia et al., 2018a] A. Loreggia, N. Mattei, F. Rossi, and K. B.         with side-information. In Proc. 15th NeurIPS, pages 505–512,
   Venable. On the distance between CP-nets. In Proc. 17th AAMAS,            2002.
   2018.
[Loreggia et al., 2018b] A. Loreggia, N. Mattei, F. Rossi, and K. B.
   Venable. Preferences and ethical principles in decision making.
   In Proceedings of the 1st AAAI/ACM Conference on AI, Ethics,
   and Society (AIES), 2018.
[Loreggia et al., 2018c] A. Loreggia, N. Mattei, F. Rossi, and K. B.
   Venable. Value alignment via tractable preference distance. In R. V.
   Yampolskiy, editor, Artificial Intelligence Safety and Security,
   chapter 18. CRC Press, 2018.
[Loreggia et al., 2019] A. Loreggia, N. Mattei, F. Rossi, and K. B.
   Venable.       CPM ETRIC: Deep siamese networks for learn-
   ing distances between structured preferences. arXiv preprint
   arXiv:1809.08350, 2019.
[Mattei and Walsh, 2013] N. Mattei and T. Walsh. P REF L IB: A
   library for preferences, HTTP :// WWW. PREFLIB . ORG. In Proc. 3rd
   ADT, 2013.
[Mattei and Walsh, 2017] N. Mattei and T. Walsh. A P REF L IB .O RG
   Retrospective: Lessons Learned and New Directions. In U. En-
   driss, editor, Trends in Computational Social Choice, chapter 15,
   pages 289–309. AI Access Foundation, 2017.
[Popova et al., 2013] A. Popova, M. Regenwetter, and N. Mattei. A
   behavioral perspective on social choice. AMAI, 68(1–3):135–160,
   2013.
[Pu et al., 2011] P. Pu, B. Faltings, L. Chen, J. Zhang, and P. Viap-
   piani. Usability guidelines for product recommenders based on
   example critiquing research. In F. Ricci, L. Rokach, B. Shapira,
   and P. B. Kantor, editors, Recommender Systems Handbook, pages
   511–545. Springer, 2011.
[Rossi and Loreggia, 2019] F. Rossi and A. Loreggia. Preferences
   and ethical priorities: Thinking fast and slow in AI. In Proc. 18th
   AAMAS, pages 3–4, 2019.
[Rossi and Mattei, 2019] Francesca Rossi and Nicholas Mattei.
   Building ethically bounded AI. In Proc. 33rd AAAI, 2019.