<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Prediction via Neural Networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sunghyun Kim</string-name>
          <email>sunghyun@mit.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Minje Jang</string-name>
          <email>jangminje427@krafton.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Changho Suh</string-name>
          <email>chsuh@kaist.ac.kr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Environments (ComplexRec) Joint Workshop @ RecSys 2021</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Korea Advanced Institute of Science and Technology</institution>
          ,
          <addr-line>291 Daehak-ro, Yuseong-gu, Daejeon, 34141</addr-line>
          ,
          <country country="KR">South Korea</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Krafton</institution>
          ,
          <addr-line>231 Teheran-ro, Gangnam-gu, Seoul, 06142</addr-line>
          ,
          <country country="KR">South Korea</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Main Contributions. First</institution>
          ,
          <addr-line>we identify a key com-</addr-line>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Massachusetts Institute of Technology</institution>
          ,
          <addr-line>77 Massachusetts Avenue, Cambridge, MA, 02139</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We consider the group match prediction problem where the goal is to estimate the probability of one group of items preferred over another, based on partially observed group comparison data. Most prior algorithms, each tailored to a specific statistical model, sufer from inconsistent performances across diferent scenarios. Motivated by a key common structure of the state-of-the-art, which we call the reward-penalty structure, we develop a unified performances across a wide range of scenarios. Our distinction lies in introducing neural networks embedded with the reward-penalty structure. Extensive experiments on synthetic and real-world datasets show that our framework consistently leads to the best performance, while the state-of-the-art perform inconsistently.</p>
      </abstract>
      <kwd-group>
        <kwd>group match prediction</kwd>
        <kwd>group recommendation</kwd>
        <kwd>neural network</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>We often compare a pair of items (or groups of items) and
ested in predicting comparison outcomes for unobserved
pairs, based on partial comparison data for previously
observed pairs, and in recommending select options in
preference to the others. We investigate the group match
prediction (group recommendation) problem where we
aim to estimate the probability of one group of  items
preferred over another, based on partially observed group
comparison data.</p>
      <p>Most prior algorithms postulate ground-truth utilities
for items, and assume underlying models to exist. These
models are considered to describe the utility of a group
as a function of the utilities of its items, and govern
statistical patterns of comparison data based on the group
oped for prominent yet specific statistical models, and
shown to attain optimal performances (see Section 4.1).</p>
      <p>Yet, major challenges arise when we attempt to employ
them in practice.
mances. Tailored to specific models, they achieve decent
performances in some scenarios that are well-represented
by their models, but perform poorly in others. This
inconsistency limits the application of each algorithm to a
narrow range of scenarios.
losing (or being less preferred) in an advantageous one. performance improvements, we conduct ablation studies
Also, the magnitudes of such rewards and penalties are (Table 2 in Section 5.2). As our baseline, we consider our
proportional to its contribution to its group. It turns framework where the reward and penalty modules are
out that across diferent state-of-the-art algorithms, only arbitrarily made defunct (Section 5.2 for details).
Evaluatthe terms corresponding to rewards and penalties vary ing the performances of our framework and the baseline
(Section 4.1 for details), but the common mechanism re- on all five real-world datasets in terms of both cross
enmains unchanged. We take it as the main structure in tropy loss and prediction accuracy, we demonstrate that
our framework ((5) in Section 4.2 for details). the reward and penalty modules consistently improve</p>
      <p>
        Neural networks: Motivated by the above observations, performances across all datasets and metrics, confirming
we incorporate three separate neural network modules their efectiveness.
into our framework (Sections 4.2 and 4.3 for
architectural details): two modules represent reward and penalty
mechanisms, and the other combines them to produce 2. Related Work
ifnal comparison outcome predictions. These modules
are trained according to the given dataset so that they The most relevant prior works are [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4">1, 2, 3, 4</xref>
        ]. These works
can adapt to any hidden models that underlie the dataset, assume prominent models (some long-established and
making our framework attain universal applicability. The some widely used in practice) for in-group interactions
overall architecture that puts all three modules together and group comparisons, and carry out statistical analysis
is specifically designed to maintain the reward-penalty to a great extent. We compare our framework with the
structure, in order to retain high performances across algorithms developed therein.
various scenarios. The problem of estimating individual item utilities
      </p>
      <p>
        Scalability: A prior work employed a single-layer neu- from group comparison data has been investigated in
ral network as an initial efort to predict winning prob- [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. They considered extensions of the BTL model
abilities of group matches [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. This attempt showed a where the group utility is either the sum or the product
promising result, demonstrating improved prediction ac- of its items’ utilities, and group comparisons follow the
curacy on a real-world online game dataset. However, it BTL model. We call the two models the BTL-sum and
exhibits scalability issues. It requires one input node per BTL-product models respectively.
item, thus is too prohibitive to be extended for scenarios A more advanced in-group interaction model has been
with large numbers of items (scaling with  ). In contrast, developed in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. They considered a scenario where a
we design our neural network modules carefully to avoid pair of individual items in a group leads to a synergy. The
such scalability issues (Figures 1 and 2: the input dimen- group utility is represented as the sum of two quantities,
shiiognhserarpee rfixfeodrmtoanc2e s croemgapradrleedsstootfh e).prIitoarlswooprkro.vides witehmichutwilietiecsalalntdhe(2H)tOhIe msuomdeolf: t(h1e) pthroedsuucmtsooff ianlldpivaiidrsuoafl
      </p>
      <p>
        Extensive experiments: We compare our framework individual item utilities. The work of [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] has considered
with the single-layer baseline and other algorithms by ex- a scenario where any  -tuple of items leads to a synergy.
tensive experiments on synthetic and real-world datasets. In [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], they assumed individual item utilities to be
cenUsing synthetic datasets (Section 5.1), we show that our tered around a mean following a Gaussian distribution
framework achieves the best performance across four and viewed the group utility as their sum, which we call
prominent models in the literature, while the other al- the Thurstone model. Their algorithm is widely used
gorithms sufer from inconsistent performances across in skill rating systems of online games where groups of
diferent models. Three models consider extensions of users compete.
the Bradley-Terry-Luce model [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] to the group compari- The work of [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] has made an initial efort in
employson scenario. The other considers a generalized version ing a neural network to predict winning probabilities of
of the Thurstone model [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] widely used in skill rating group matches. It has been shown that a single-layer
neusystems of online games. Using real-world datasets (Sec- ral network can fit some variants of the BTL model [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
tion 5.2), we show that our framework performs con- and improve prediction accuracy through experiments
sistently well across diverse real-world scenarios. We on a real-world online game dataset. As noted, one clear
consider five real-world datasets (sources in Footnote 7). distinction compared to our work lies in scalability. The
One is a crowd-sourced image classification dataset, an- neural network developed in the prior work requires one
other is a collection of movie ratings, and the other three input node per item, thus is prohibitive for scenarios with
are online game match records. In our performance eval- large numbers of items. In contrast, we develop a neural
uations, we use the cross entropy loss (see (1)) and the network that requires only a fixed number of input nodes
prediction accuracy (see (8)). regardless of the number of items (Section 4.2 for details).
      </p>
      <p>Ablation studies: To verify that the reward-penalty
structure in our framework plays a key role in bringing</p>
    </sec>
    <sec id="sec-2">
      <title>3. Problem Setup</title>
      <p>(, ,</p>
      <p>The goal is to predict comparison outcome likelihoods
for unobserved pairs of groups, given partial group
comparison data. Each data point consists of (, , 

),
).  and  are groups of 
individual items.
indicates which group is preferred:  
= ( ≻ )
where  ≻ 
indicates</p>
      <p>preferred over  . We denote
the set of observed comparisons by  obs and that of
unobserved comparisons by  unobs. Our training dataset
Then formally, our goal is to minimize:
available is {(, , 
of Pr [ 
tropy loss as our metric. Let us define  ̂
as the estimate
= 1] produced by our algorithmic framework.

)}(,)∈
obs
. We use the cross
en−1
| obs| (,)∈
∑
obs
 
log  ̂
+ (1 −   )log(1 −   ̂ ).</p>
      <p>Notation. [] = {1, 2, … , }
is the set of all items. We
use lowercase letters (e.g.,  ) for individual items, and
uppercase letters (e.g.,  ) for sets of items. {  }∈ is the set
of   ’s for all  ∈  . [  ]∈ is a vector of   ’s for all  ∈ 
and its ordering is provided in the context.  ̂ is an
estimate of  . We use subscripts as in  
when  concerns a
comparison between  and  . We use superscripts as in
 (ℓ) when  is updated iteratively. We use bold symbols
as in  to indicate the vector of   ’s for all  ∈ [] .</p>
    </sec>
    <sec id="sec-3">
      <title>4. Proposed Algorithm</title>
      <sec id="sec-3-1">
        <title>4.1. Motivation</title>
        <p>
          Our framework design is inspired by optimal
state-of-theart algorithms (achieving minimal sample complexity or
global minima of cross entropy loss) in extensions of the
Bradley-Terry-Luce model (BTL) model [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. We make
the following key observations regarding the structural
similarities shared by the state-of-the-art:
() They exhibit “reward” and “penalty” terms in the
estimation process (details in (2) and (3)). In updating
an item’s utility estimate, they reward the item for
contributing to its group’s winning, and penalize it
for contributing to its group’s losing.
() The expressions of these reward/penalty terms vary
        </p>
        <p>as the underlying models change.
() The magnitudes of rewards and penalties depend
on the power dynamics between groups. A greater
reward is given to an item when its group is
relatively weaker compared to the opponent group, and
likewise a greater penalty is given when its group
is relatively stronger.
() Their magnitudes depend on the portion of an item’s
contribution (or blame) within its group. Suppose
an item’s group wins (or loses) in a comparison. The
item is given a greater reward (or penalty) when its
share of contribution (or blame) within the group is
relatively greater.</p>
        <p>
          Reward-Penalty Structure. Let us examine the
algorithms developed in extensions of the BTL model.
• Rank Centrality [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] has been developed under the
standard BTL model where individual items are compared
in pairs. It achieves the minimal sample complexity
for top- rank aggregation, whose task is to estimate
the set of top- items, in certain regimes [
          <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
          ]. As
in Section 3, we define
        </p>
        <p>
          = ( ≻ ) , given a pair of
items  and  . By some manipulation, the update rule
at step ℓ for individual item utility   (ℓ) of item  can
(1)
be shown as1:
  (ℓ+1) ←   (ℓ)+ 
1As in [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ],  =
        </p>
        <p>1
max   where   is the number of distinct items to
which item  is compared. Also, we describe Rank Centrality as an
iterative algorithm (one way to obtain the stationary distribution of
the empirical pairwise preference matrix) to highlight its inherent
reward-and-penalty structure.
  = (∑(,)∈
obs∶∈
( (ℓ) +  (ℓ))−1)−1.</p>
        <p>∑
∶(,)∈ obs
(    (ℓ)− (1 −   )  (ℓ)) .</p>
        <p>(2)
For item  , one can interpret   (ℓ) as the reward since
it increases   (ℓ+1) when  ≻  (  = 1), and   (ℓ) (next
to (1 −   )) as the penalty since it decreases   (ℓ+1)
when  ≺  (  = 0).  is a step size in the update.</p>
        <p>Note that the reward is large when the opponent’s
utility estimate is large; we give a large reward for a
win against a strong opponent (observation () above).</p>
        <p>
          Likewise, the penalty is large when its own utility
estimate is large; we give a large penalty for a loss
against a weak opponent (observation () above).
• Majorization-Minimization (MM) for the BTL-sum
model has been developed in [
          <xref ref-type="bibr" rid="ref1 ref6">6, 1</xref>
          ]. We define   (ℓ) ∶=
∑∈   (ℓ). By some manipulation, the update rule at
step ℓ for individual item utility   (ℓ) of item  can be
shown as2:
(ℓ) ). Note also that the larger  times to update  (ℓ) as follows:
        </p>
        <p>Operation: Recall that our comparison data samples
do not include utility estimate information. Each sample
only specifies the pair of groups compared
(, )
and
the comparison outcome  ̂ . Thus, we begin with a
randomly initialized utility estimate vector  (0) ∈ ℝ
(initialization details in Section 4.4). Starting from  (0), we
obtain  () ∈ ℝ by applying modules R and P repeatedly
We introduce two separate modules to represent re- are given as follows:
R(ℓ)
.</p>
        <p>
          At step ℓ, given  (ℓ), R and P produce positive real
values in [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ]. They are reward and penalty values
respectively, and are used to obtain  (ℓ+1) as per (5). They
opponent group’s utility estimate is large (see   (ℓ)
in the numerator of R,
() above). The same holds for the penalty.
the contribution of item  within its own group (see
  (ℓ)
        </p>
        <p>
          /  (ℓ)of R(, ℓ) ), the greater the reward (observation
• MM for the BTL-product model has been developed
in [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] and shown to achieve global minima in terms of
cross entropy loss. Its individual item utility update
rule is described as in (3) but the reward and penalty
terms are diferent. 3 A similar interpretation applies,
and all observations () –() can be found.
wards and penalties respectively. We employ neural
networks for the modules so they can adapt to the underlying
models for any given scenario and dataset.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>4.2. Modules R and P</title>
        <p>R
[wj(ℓ)]j∈B ilnapyeurt
ure 3 illustrates the process. See the first stage therein.</p>
        <p>
          Scalability: The input and output dimensions (2 ) are
independent of the total number of items ( ). Thus, our
framework does not sufer from scalability issues in
contrast to a prior neural network based approach [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] in
which the input dimension of the neural network therein
scales in proportion to  . This scalability issue for the
prior approach manifests in our real-world datasets with
large numbers of items. See our real-world data
experiment results for GIFGIF and IMDb 5000 datasets in Table 1
in Section 5.2.
        </p>
        <p>Refinement:
to obtain a final utility estimate vector  () . The best
value of  is set via hyper-parameter tuning. This series</p>
        <p>We apply R and P multiple times ( &gt; 1 )</p>
        <p>Data augmentation: R and P take as input a
con, catenation of two vectors [ 
(ℓ)]∈
and [ 
(ℓ)]∈ . As
arbitrarily according to the scenario of interest. R (P re- of applications is to “refine” utility estimates.
and produces as output the current reward (penalty re- they are vectors, not sets, ordering matters.
We
apspectively) estimates for the items. All layers are fully
ply data augmentation to make our framework
ronumerator  is determined by hyper-parameter tuning.
tra samples such as ( ′ = (2, 1),  ′ = (4, 3),   ′ ′ = 1).</p>
        <p>We also seek robustness against arbitrary orderings
of the groups.</p>
        <p>Specifically, we create extra
samples by changing the order of two sets</p>
        <p>and  as
in ( = (3, 4),  = (1, 2), 
( ′ = (4, 3),  ′ = (2, 1),   ′ ′ = 0).</p>
        <p>= 0) and  ′ and  ′ as in</p>
        <p>These techniques
derings within a group and group orderings.</p>
        <p>Ablation study: R and P are the most distinctive
features of our framework. To evaluate their efectiveness,
we conduct ablation studies. See Table 2 in Section 5.2.</p>
        <p>a sample ( = (1, 2),  = (3, 4), 
= 1), we create ex- probability estimate of  preferred over  :
 ̂
= G ([ 
(ℓ)]∈ , [ 
(ℓ)]∈ ) .</p>
        <p>(7)
As in (6), module G takes as input a concatenation of
two vectors [ 
(ℓ)]∈
and [</p>
        <p>(ℓ)]∈ . The item and group
orderings for the input to G are the same as those for the
as input to module G are obtained from the operation of
modules R and P.
w(ℓ)
help train R and P so as to be robust against item or- input to R and P. As mentioned, the item utilities used</p>
      </sec>
      <sec id="sec-3-3">
        <title>4.3. Module G</title>
        <p>[wi(L)]
i∈A</p>
        <p>ReLU</p>
        <p>ReLU</p>
        <p>ReLU</p>
        <p>ReLU
items within a group) lead to group utilities. Group com- denote by  train the data in batch  ∈ [] .
ferred over the other. The three modules interact closely () Refinement:</p>
        <p>
          For all samples (, ) ∈ 
be a fraction (1%–2%) of  obs and use it for validation
purposes. We divide  train into  equal-size batches. We
() Initialization: Initialize  (0) using a Gaussian
distribution whose mean is zero and variance is the
normalized identity matrix, and the parameters of R, P
and G using the Xavier initialization [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ].
        </p>
        <p>train, start
initialized in
to perform the task. The overall architecture that ties
them all will soon be depicted at the end of this section.</p>
        <p>Structure: Figure 2 depicts the detailed architecture
of module G. The input and output of module G are of
dimension 2 and a scalar respectively. As in modules R
and P, we set the value of</p>
        <p>
          as per the given scenario of
interest. As the dimensions are independent of the total
number of items ( ), the module does not sufer from
scalability issues. All layers are fully connected. The
activations between layers are rectified linear units [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. ()
The final activation is the sigmoid function [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
        </p>
        <p>Operation: Given a group comparison (, )</p>
        <p>, the
module takes as input the utility estimates of the items in
groups  and  , and produces as output the winning
()
()
()
()
with {([ 
(0)
]∈ , [  (0)
]∈ )}(,)∈

train
() , apply R and P modules</p>
        <p>times, and obtain
{([ 
]∈ , [</p>
        <p>]∈ )}(,)∈
() Prediction: For all samples (, ) ∈ 
{([ 
]∈ , [</p>
        <p>]∈ )}(,)∈
ply G module, and obtain { ̂

train, start with
obtained in () ,
ap}
 .</p>
        <p>
          train
Model parameter update: Update the parameters of
R, P, and G via the Adam optimizer [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] to minimize
the loss. To compute the loss, replace  obs in (1)
by  train. Apply weight decay regularization with a
factor of 0.01. Repeat () –() for all batches  ∈ [] .
        </p>
        <p>.
train

train
The parameters are updated  times using  train once in
one epoch. We use 500 epochs. In each, we compute a
validation loss using  val. We apply early stopping and
choose the parameters with the lowest loss.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Experimental Results</title>
      <p>
        To verify the broad applicability of our framework,
we conduct extensive experiments using synthetic
and real-world datasets. We compare it with six other
algorithms: MM-sum [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], MM-prod [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], SGD-HOI [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ],
TrueSkill [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], Rank Centrality [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], and an algorithm
based on a single-layer neural network [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. SGD-HOI
has been developed for the factorization HOI model in
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and TrueSkill for the Thurstone model in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. As
Rank Centrality has been developed for the case of
comparing two individual items, we consider its natural
extensions depending on the dataset. For example, in
the sum model, we replace the summation in (2) by
∑(,)∈
      </p>
      <sec id="sec-4-1">
        <title>5.1. Synthetic Data Experiments</title>
        <p>HOI and a generalized Thurstone. We set  = 300 and
 = 5 . In the HOI model, we generate the ground-truth
utilities and dimension-7 features using Gaussian
distributions. In the others, we generate the ground-truth
utilities uniformly at random. We generate 5 log 
distinct paired groups and each pair is compared 10 times.6</p>
        <p>We split generated datasets randomly into  obs (90%)
unobserved group comparisons in  unobs. They use a
fraction (1%–2%) of  obs for validation purposes. We use
 = 20 and the learning rate of 10−2. We use four hidden
layers for all modules. For R and P, each layer consists
of 7
nodes, and for G, each layer consists of 9
nodes.</p>
      </sec>
      <sec id="sec-4-2">
        <title>5.2. Real-World Data Experiments</title>
        <p>
          As in Section 5.1, we split real-world datasets randomly
We use four synthetic datasets: BTL-sum, BTL-product, but performs best when we have suficient data. This is
and  unobs (10%). All algorithms use  obs to predict its optimality has not been shown in the literature.
nodes represent items and an hyper-edge represents a group
comparison [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. Without connectedness guarantees, some algorithms
may fail to produce meaningful estimates.
        </p>
        <p>Ti (Single GPU).</p>
        <p>6The 5 log  is chosen in order to guarantee connectedness be- into  obs (90%) and  unobs (10%), and use a fraction
tween any pair of nodes within a comparison (hyper-)graph where (1%–2%) of  obs for validation purposes if necessary. We
use five real-world datasets 7: GIFGIF, HOTS, DOTA 2, LoL, records.7 Two groups with five players each compete.
IMDb 5000. We use  = 30, 15, 15, 20, 20 and the learning The players choose heroes out of a pool of 140. There are
rates of 10−3, 10−3, 10−2, 10−2, 10−2 respectively. We use 7,610 records. (5) IMDb 5000: A collection of meta-data
four hidden layers for all modules. Each layer consists of for 5,000 movies.7 Each movie has a score and is
asso7 nodes for R and P, and 9 nodes for G. ciated with keywords. To fit our purpose, we generate</p>
        <p>
          Let us discuss each dataset. (1) GIFGIF: A crowd- match records for movie pairs. We consider each movie
sourcing project.7 We use the dataset pre-processed in as a group and its five keywords as its items. Given a
[
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. A participant is presented with two images and pair, we declare a win for the one with a higher score.
asked to choose one which better describes a given emo- We have 8,021 keywords and 123,420 samples.
tion.8 This dataset belongs to a special case of our in- In addition to the cross entropy loss, we also consider
terest as individual comparisons are concerned. We con- another metric highly-relevant in practice: prediction
sider the emotion of happiness. We have 6,120 images accuracy. We declare it a win for a group if the estimate
and 106,886 samples. (2) HOTS: A collection of HOTS of its winning probability is above a certain threshold,
match records from 10/26/17 to 11/26/17 collected by which we set as 0.5 [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. Thus, the prediction accuracy
HOTS logs.7 Each match consists of two groups with five is expressed as follows9:
players each. The players choose heroes for each match
out of a pool of 84. We choose high-quality matches only 1 ∑    ≥ 1 (  ̂ ) + (1 −   ) &lt; 1 (  ̂ ).
where all players are highly-skilled according to some | unobs| (,)∈ unobs 2 2
available statistics. There are 26,486 records. (3) DOTA (8)
2: A collection of DOTA 2 match records.7 Each match
consists of two groups with five players each, and they
choose heroes out of a pool of 113. There are 50,000
records. (4) LoL: A collection of LoL professional match
        </p>
        <p>Table 1 shows our result. We boldface the best
performances and underline the second-best. The numbers
in parentheses indicate the ranks among the algorithms
being compared in a given setup of dataset and metric.</p>
        <p>Our framework consistently yields the top performances
7gifgif.media.mit.edu (GIFGIF); hotslogs.com/Info/API
(HOTS); kaggle.com/devinanzelmo/dota-2-matches (DOTA 2);
kaggle.com/chuckephron/leagueoflegends ( LoL);
kaggle.com/carolzhangdc/imdb-5000-movie-dataset (IMDb 5000).</p>
        <p>8“neither” is also allowed, but we exclude such data.
9We define  ≥ 1 () as 1 if  ≥ 1 and as 0 otherwise. Similarly,</p>
        <p>2 2
 &lt; 1 () equals to 1 if  &lt; 12 and to 0 otherwise.</p>
        <p>2
in all cases, while the others sufer from significantly
inconsistent performances across datasets and/or metrics.</p>
        <p>To verify that our design of modules R and P plays a
critical role in achieving consistently high performances
across various scenarios, we conduct ablation studies.</p>
        <p>As our baseline, we consider the case where we refine
initialized estimates  (0)using R and P once ( = 1 ), data
augmentation is not applied, and the model parameters
of R and P are not updated. Thus, module G takes as
input coarse utility estimates and produces as output
a group comparison outcome prediction. This baseline
can be viewed as a naive neural network based approach
without the key reward-penalty mechanisms.</p>
        <p>Table 2 shows our result. The best performance
improvements (boldfaced) reach up to 12.291% in cross
entropy loss and 14.764% in prediction accuracy. The
average performance improvements are 4.511% in cross
entropy loss and 7.12% in prediction accuracy. Note that
across all datasets and metrics, it is empirically
demonstrated that the reward-penalty structure by modules R
and P brings consistent improvements.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>6. Conclusion</title>
      <p>We explore the group match prediction problem where
the goal is to predict the probability of one group
preferred over the other given an unseen pair of groups,
based on partially observed group comparison data. We
develop a unified algorithmic framework that employs
neural networks and show that it can yield consistently
best performances compared to other state-of-the-art
algorithms on multiple datasets across various domains.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work was supported by the ICT R&amp;D program of
Institute of Information &amp; Communications Technology
Planning &amp; Evaluation (IITP) grant funded by the Korea
government (MSIT) (No. 2020-0-00626).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T. K.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. C.</given-names>
            <surname>Weng</surname>
          </string-name>
          ,
          <article-title>Generalized Bradley-Terry models and multi-class probability estimates</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          <volume>7</volume>
          (
          <year>2006</year>
          )
          <fpage>85</fpage>
          -
          <lpage>115</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T. K.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. C.</given-names>
            <surname>Weng</surname>
          </string-name>
          , Ranking individuals by group comparisons,
          <source>Journal of Machine Learning Research</source>
          <volume>9</volume>
          (
          <year>2008</year>
          )
          <fpage>2187</fpage>
          -
          <lpage>2216</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          , M. Cheng,
          <string-name>
            <given-names>K.</given-names>
            <surname>Fujii</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hsieh</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.-J. Hsieh</surname>
          </string-name>
          , Learning from group comparisons:
          <article-title>Exploiting higher order interactions</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          (
          <year>2018</year>
          )
          <fpage>4981</fpage>
          -
          <lpage>4990</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Herbrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Minka</surname>
          </string-name>
          , T. Graepel, Trueskill™:
          <article-title>A bayesian skill rating system</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          (
          <year>2007</year>
          )
          <fpage>569</fpage>
          -
          <lpage>576</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Negahban</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Oh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <article-title>Rank centrality: Ranking from pair-wise comparisons</article-title>
          ,
          <source>Operations Research</source>
          <volume>65</volume>
          (
          <year>2016</year>
          )
          <fpage>266</fpage>
          -
          <lpage>287</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Hunter</surname>
          </string-name>
          ,
          <article-title>MM algorithms for generalized Bradley-Terry models</article-title>
          ,
          <source>Annals of Statistics</source>
          (
          <year>2004</year>
          )
          <fpage>384</fpage>
          -
          <lpage>406</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Menke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. R.</given-names>
            <surname>Martinez</surname>
          </string-name>
          ,
          <article-title>A Bradley-Terry artificial neural network model for individual ratings in group competitions</article-title>
          ,
          <source>Neural Computing and Applications</source>
          <volume>17</volume>
          (
          <year>2008</year>
          )
          <fpage>175</fpage>
          -
          <lpage>186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Bradley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. E.</given-names>
            <surname>Terry</surname>
          </string-name>
          ,
          <article-title>Rank analysis of incomplete block designs: I. the method of paired comparisons</article-title>
          ,
          <source>Biometrika</source>
          <volume>39</volume>
          (
          <year>1952</year>
          )
          <fpage>324</fpage>
          -
          <lpage>345</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>C.</given-names>
            <surname>DeLong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Pathak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Erickson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Perrino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Shim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Srivastava</surname>
          </string-name>
          , Teamskill:
          <article-title>Modeling team chemistry in online multi-player games</article-title>
          ,
          <source>PacificAsia Conference on Knowledge Discovery and Data Mining</source>
          (
          <year>2011</year>
          )
          <fpage>519</fpage>
          -
          <lpage>531</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Jang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Suh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Oh</surname>
          </string-name>
          ,
          <article-title>Optimal sample complexity of  -wise data for top- ranking</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          (
          <year>2017</year>
          )
          <fpage>1686</fpage>
          -
          <lpage>1696</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Fan</surname>
          </string-name>
          , C. Ma,
          <string-name>
            <given-names>K.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Spectral method and regularized MLE are both optimal for top-K ranking</article-title>
          ,
          <source>Annals of statistics 47</source>
          (
          <year>2019</year>
          )
          <fpage>2204</fpage>
          -
          <lpage>2235</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>X.</given-names>
            <surname>Glorot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bordes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <article-title>Deep sparse rectifier neural networks</article-title>
          ,
          <source>Proceedings of the fourteenth international conference on artificial intelligence and statistics</source>
          (
          <year>2011</year>
          )
          <fpage>315</fpage>
          -
          <lpage>323</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>K.</given-names>
            <surname>Hornik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Stinchcombe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>White</surname>
          </string-name>
          ,
          <article-title>Multilayer feedforward networks are universal approximators</article-title>
          ,
          <source>Neural networks 2</source>
          (
          <year>1989</year>
          )
          <fpage>359</fpage>
          -
          <lpage>366</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>X.</given-names>
            <surname>Glorot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <article-title>Understanding the dificulty of training deep feedforward neural networks</article-title>
          ,
          <source>International Conference on Machine Learning</source>
          (
          <year>2010</year>
          )
          <fpage>249</fpage>
          -
          <lpage>256</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Kingma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ba</surname>
          </string-name>
          ,
          <article-title>Adam: A method for stochastic optimization</article-title>
          ,
          <source>arXiv preprint arXiv:1412.6980</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>O.</given-names>
            <surname>Cooley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Koch</surname>
          </string-name>
          ,
          <article-title>Threshold and hitting time for high-order connectedness in random hypergraphs</article-title>
          ,
          <source>Electronic Journal of Combinatorics</source>
          (
          <year>2016</year>
          )
          <fpage>2</fpage>
          -
          <lpage>48</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>L.</given-names>
            <surname>Maystre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Grossglauser</surname>
          </string-name>
          ,
          <article-title>Just sort it! A simple and efective approach to active preference learning</article-title>
          ,
          <source>International Conference on Machine Learning</source>
          (
          <year>2017</year>
          )
          <fpage>2344</fpage>
          -
          <lpage>2353</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>O.</given-names>
            <surname>Delalleau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Contal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Thibodeau-Laufer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. C.</given-names>
            <surname>Ferrari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <surname>F. Zhang,</surname>
          </string-name>
          <article-title>Beyond skill rating: Advanced matchmaking in ghost recon online</article-title>
          ,
          <source>IEEE Transactions on Computational Intelligence and AI in Games</source>
          <volume>4</volume>
          (
          <year>2012</year>
          )
          <fpage>167</fpage>
          -
          <lpage>177</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>