=Paper= {{Paper |id=Vol-2657/short1 |storemode=property |title=Knowledge Intensive Learning of Generative Adversarial Networks |pdfUrl=https://ceur-ws.org/Vol-2657/short1.pdf |volume=Vol-2657 |authors=Devendra Singh Dhami,Mayukh Das,Sriraam Natarajan |dblpUrl=https://dblp.org/rec/conf/kdd/DhamiDN20 }} ==Knowledge Intensive Learning of Generative Adversarial Networks== https://ceur-ws.org/Vol-2657/short1.pdf
                                         Knowledge Intensive Learning of
                                         Generative Adversarial Networks
           Devendra Singh Dhami                                               Mayukh Das                                Sriraam Natarajan
        devendra.dhami@utdallas.edu                                   Samsung Research India                     The University of Texas at Dallas
       The University of Texas at Dallas                             mayukh.das@samsung.com                       sriraam.natarajan@utdallas.edu
ABSTRACT                                                                                   We aim to address the above limitations. Inspired by Mitchell’s
While Generative Adversarial Networks (GANs) have accelerated                           argument of “The Need for Biases in Learning Generalizations” [38],
the use of generative modelling within the machine learning com-                        we mitigate the challenges of existing data hungry methods via in-
munity, most of the applications of GANs are restricted to images.                      ductive bias while learning GANs. We show that effective inductive
The use of GANs to generate clinical data has been rare due to the                      bias can be provided by humans in the form of domain knowl-
inability of GANs to faithfully capture the intrinsic relationships                     edge [14, 27, 41, 50]. Rich human advice can effectively balance
between features. We hypothesize and verify that this challenge can                     the impact of quality (sparsity) of training data. Data quality also
be mitigated by incorporating domain knowledge in the generative                        contributes to, the well studied, modal instability of GANs. This
process. Specifically, we propose human-allied GANs that using                          problem is especially critical in domains such as medical/clinical
correlation advice from humans to create synthetic clinical data. Our                   analytics that does not typically exhibit ‘spatial homophily’ [21], un-
empirical evaluation demonstrates the superiority of our approach                       like images, and are prone to distributional diversity among feature
over other GAN models.                                                                  clusters as well. Our human-guided framework proposes a robust
                                                                                        strategy to address this challenge. Note that in our setting the human
CCS CONCEPTS                                                                            is an ally and not an adversary.
                                                                                           The second limitation of access is crucial for medical data gener-
   • Deep Learning → Generative Adversarial Networks; • Ap-
                                                                                        ation. Access to existing medical databases [10, 18] is hard due to
plication → Healthcare; • Learning → Knowledge Intensive Learn-
                                                                                        cost and access concerns and thus synthetic data generation holds
ing.
                                                                                        tremendous promise [6, 13, 19, 35, 48]. While previous methods
KEYWORDS                                                                                generated synthetic images, we go beyond images and generate clin-
    generative adversarial networks, human in the loop, healthcare                      ical data. Building on this body of work, we present a synthetic data
ACM Reference Format:
                                                                                        generation framework that effectively exploits domain expertise to
Devendra Singh Dhami, Mayukh Das, and Sriraam Natarajan. 2020. Knowl-                   handle data quality.
edge Intensive Learning of Generative Adversarial Networks. In Proceedings                 We make a few key contributions:
of KDD Workshop on Knowledge-infused Mining and Learning (KiML’20). ,
6 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn                                            (1) We demonstrate how effective human advice can be provided
                                                                                                to a GAN as an inductive bias.
1    INTRODUCTION                                                                           (2) We present a method for generating data given this advice.
                                                                                            (3) Finally, we demonstrate the effectiveness and efficacy of our
Deep learning models have reshaped the machine learning landscape
                                                                                                approach on 2 de-identified clinical data sets. Our method
over the past decade [16, 29]. Specifically, Generative Adversar-
                                                                                                is generalizable to multiple modalities of data and is not
ial Networks (GANs) [17] have found tremendous success in gen-
                                                                                                necessarily restricted to images.
erating examples for images [34, 37, 45], photographs of human
                                                                                            (4) Yet another feature of this approach is that training occurs
faces [1, 25, 52], image to image translation [30, 33, 55] and 3D
                                                                                                from very few data samples (< 50 in one domain) thus pro-
object generation [44, 51, 53] to name a few. Despite such success,
                                                                                                viding human guidance as a data generation alternative.
there are several key factors that limit the widespread adoption of
GANs, for a broader range of tasks, including, widely acknowledged
data hungry nature of such methods, potential access issues of real                     2    RELATED WORK
medical data and finally, their restricted usage, mainly in the con-                    The key principle behind GANs [17] is a zero-sum game [26] from
text of images. These factors have limited the use of these arguably                    game theory, a mathematical representation where each participant’s
successful techniques in medical (or similar) domains. However,                         gain or loss is exactly balanced by the losses or gains of the other
recently, synthetic data generation has become a centerpiece of re-                     participants and is generally solved by a minimax algorithm. The
search in medical AI due to the diverse difficulties in collection,                     generator distribution 𝑝𝑑𝑎𝑡𝑎 (𝒙) over the given data 𝒙 is learned by
persistence, sharing and analysis of real clinical data.                                sampling 𝒛 from a random distribution 𝑝 𝒛 (𝒛) (initially uniform was
                                                                                        proposed but Gaussians have been proven superior [2]). While GANs
In M. Gaur, A. Jaimes, F. Ozcan, S. Shah, A. Sheth, B. Srivastava, Proceedings of the
Workshop on Knowledge-infused Mining and Learning (KDD-KiML 2020). San Diego,           have proven to be a powerful framework for estimating generative
California, USA, August 24, 2020. Use permitted under Creative Commons License          distributions, convergence dynamics of naive mini-max algorithm
Attribution 4.0 International (CC BY 4.0).
                                                                                        has been shown to be unstable. Some recent approaches, among
KiML’20, August 24, 2020, San Diego, California, USA,
© 2020 Copyright held by the author(s).                                                 many others, augment learning either via statistical relationships be-
https://doi.org/10.1145/nnnnnnn.nnnnnnn                                                 tween true and learned generative distributions such as Wasserstein-1
KiML’20, August 24, 2020, San Diego, California, USA,
                                                                                         Devendra Singh Dhami, Mayukh Das, and Sriraam Natarajan


distance [3], MMD [32] or via spectral normalization of the parame-       on defining a distance/divergence (Wasserstein or earth movers dis-
ter space of the generator [39] which controls the generator distribu-    tance) to measure the closeness between the real distribution and the
tion from drifting too far. Although these approaches have improved       model distribution.
the GAN learning in some cases, there is room for improvement.
   Guidance via human knowledge is a provably effective way to
control learning in presence of systematic noise (which leads to          3.1    Human input as inductive bias
instability). One typical strategy to incorporate such guidance is        Historically, two approaches have been studied for using guidance
by providing rules over training examples and features. Some of           as bias. The first is to provide advice on the labels as constraints
the earliest approaches are explanation-based learning (EBL-NN,           or preferences that controls the search space. Some example advice
[49]) or ANNs augmented with symbolic rules (KBANN, [50]). Var-           rules on the labels include: (3 ≤ feature1 ≤ 5) ⇒ label = 1 and (0.6
ious widely-studied techniques of leveraging domain knowledge             ≤ feature2 ≤ 0.8) ∧ (4 ≤ feature3 ≤ 5) ⇒ label = 0. Such advice
for optimal model generalization include polyhedral constraints in        is more relevant in an discriminative setting but are not ideal for
case of knowledge-based SVMs, [9, 14, 28, 47]), preferences rules         GANs. Since GANs are shown to be sensitive to the training data
[5, 27, 41, 42] or qualitative constraints (ex: monotonicities / syner-   and here the labels are getting generated, they should not be altered
gies [54] or quantitative relationships [15]). Notably, whereas these     during training. The second is via correlations between features as
models exhibit considerable improvement with the incorporation of         preferences (our approach) which allows for faithful representation
human knowledge, there is only limited use of such knowledge in           of diverse modality.
training GANs. Our approach resembles the qualitative constraints            Advice injection: After every fixed number of iterations, N, we
framework in spirit.                                                      calculate the correlation matrix of the generated data G1 and provide
   While widely successful in building optimally generalized models       a set of advice 𝜓 on the correlations between different features. Con-
in presence of systematic noise (or sample biases), knowledge-based       sider the following motivating example for the use of correlations as
approaches have mostly been explored in the context of discrimi-          a form of advice.
native modeling. In the generative setting, a recent work extends         Example: Consider predicting heart attack with 3 features - choles-
the principle of posterior regularization from Bayesian modeling to       terol, blood pressure (BP) and income. The values of the given
deep generative models in order to incorporate structured domain          features can vary (sometimes widely) between different patients due
knowledge [22]. Traditionally, knowledge based generative learning        to several latent factors (ex, smoking habits). It is difficult to assume
has been studied as a part of learning probabilistic graphical models     any specific distribution. In other words, it is difficult to deduce
with structure/parameter priors [36]. We aim to extend the use of         whether the values for the features come from the same distribution
knowledge to the generative model setting.                                (even though the feature values in the data set are similar).
                                                                          We modify the correlation coefficients (for both positive and neg-
3    KNOWLEDGE INTENSIVE LEARNING OF                                      ative correlations) between the features by increasing them if the
     GENERATIVE ADVERSARIAL NETWORKS                                      human advice suggests that two features are highly correlated and
                                                                          decrease the same if the advice suggests otherwise.
A notable disadvantage of adversarial training formulation is that        Example: Continuing the above example, since rise in the choles-
the training is slow and unstable, leading to mode collapse [2] where     terol level can lead to rise in BP and vice versa, expert advice here
the generator starts generating data of only a single modality. This      can suggest that cholesterol and BP should be highly correlated.
has resulted in GANs not being exploited to their full potential in       Also, as income may not contribute directly to BP and cholesterol
generating synthetic non-image clinical data. Human advice can            levels, another advice here can be to de-correlate cholesterol/BP
encourage exploration in diverse areas of the feature space and helps     and income level.
learn more stable models [43]. Hence, we propose a human-allied              The example advice rules ∈ 𝜓 are: 1. Correlation(“cholesterol
GAN architecture (HA-GAN) (figure 1). The architecture incorpo-           level",“BP")↑, 2. Correlation(“cholesterol level",“income level")↓
rates human advice in form of feature correlations. Such intrinsic        and 3. Correlation(“BP",“income level")↓, where ↑ and ↓ indicate
relationships between the features are crucial in medical data sets       increase and decrease respectively. Based on the 1st advice we need
and thus become a natural candidate as additional knowledge/advice        to increase the correlation coefficient between cholesterol level and
in guided model learning for faithful data generation.                    BP. Then
   Our approach builds upon a GAN architecture [17] where a ran-
dom noise vector is provided to the generator which tries to generate
examples as close to the real distribution as possible. The discrimi-                        1     0.2     0.3       1   𝜆   1
                                                                                                                       
nator tries to distinguish between real examples and ones generated                     C = 0.2    1      0.07 A = 𝜆   1   1           (1)
by the generator. The generator tries to maximize the probability                            0.3   0.07     1        1   1   1
                                                                                                                       
that the discriminator makes a mistake and the discriminator tries to
minimize its mistakes thereby resulting in a min-max optimization
problem which can be solved by a mini-max algorithm. We adopt             Here C is the correlation matrix, A is the advice matrix and 𝜆 is the
the Wasserstein GAN (WGAN) architecture1 [3, 20] that focuses             factor by which the correlation value is to be augmented. In case
                                                                          where we need to increase the value of the correlation coefficient, 𝜆
                                                                          should be > 1. We keep 𝜆 = 𝑚𝑎𝑥 1( | C |) . Since -1.0 ≤ ∀𝑐 ∈ C ≤ 1.0,
1 We use ‘GAN’ to indicate ‘W-GAN’                                        in this case, the value of 𝜆 ≥ 1.0, leading to enhanced correlation via
Knowledge Intensive Learning of                                                                KiML’20, August 24, 2020, San Diego, California, USA,
Generative Adversarial Networks




             Figure 1: Human-Allied GAN. Correlation advice takes generated distribution closer to the real distribution.


Hadamard product. Thus the new correlation matrix Ĉ is,                  function. For a sampled point 𝑣, CDF (𝑣) = P (𝑉 ≤ 𝑣). Thus, to
                      1                               1                  generate samples, the values 𝑣 ∼ V are passed through CDF −1 to
                             0.2     0.3   1            1
                                                1
                                                      0.3                 obtain the desired values 𝑥 [CDF −1 (𝑣) = {𝑥 |CDF (𝑥) ≤ 𝑣, 𝑣 ∈
        Ĉ = C ⊙ A = 0.2    1     0.07 ⊙  0.3   1     1
                      0.3 0.07                                           [0, 1]}]. Thus for Gaussian,
                                      1  1
                                                    1     1
                                                                  (2)
                       1      0.667 0.3                                                 ∫ 𝑥                 ∫ 𝑥
                                                                                        1        −𝑥 2      1           −𝑥 2
                   = 0.667     1      0.07                                CDF (𝑥) = √      exp 2 𝑑𝑥 = √         exp 2 𝑑𝑥
                       0.3     0.07       1                                           2𝜋 −∞              2𝜋 0
                                                                                                                                                  (4)
                                                                                                                  −𝑥 2 𝑥
If the advice says that features have low correlations (2nd rule in                                     = [− exp(     )]
example), we decrease the correlation coefficient. Now, 𝜆 must be                                                  2 0
< 1 and we set 𝜆 = 𝑚𝑎𝑥 (|C|). Since -1 ≤ ∀𝑐 ∈ C ≤ 1.0, the value of
                                                                                                                                               2
𝜆 ≤ 1.0. Thus multiplying by 𝜆 will decrease the correlation value,       The inverse CDF can be thus written as CDF −1 (𝑣) = 1−exp( −𝑥2 ) ≤
and the new correlation matrix is,                                                                                                   p
                                                                          𝑣 and the desired values 𝑥 ∈ M can be obtained as 𝑥 = 2𝑙𝑛(1 − 𝑣).
                     1      0.667 0.3   1      1 0.3                  [Step 2]: Calculate the correlation matrix E of M.
                    
    Ĉ1 = Ĉ ⊙ A = 0.667
                              1    0.07 ⊙  1
                                                  1 0.3                  [Step 3]: Calculate the Cholesky decomposition F of the corre-
                     0.3    0.07     1   0.3 0.3 1                    lation matrix E. Cholesky decomposition [46] of a positive-definite
                                                         (3)
                     1      0.667   0.09                                matrix is given as the product of a lower triangular matrix and its con-
                    
                 = 0.667     1    0.021
                                                                         jugate transpose. Note that for Cholesky decomposition to be unique,
                     0.09 0.021       1                                the target matrix should be positive definite, (such as the co-variance
                                                                          matrix) whereas the correlation matrix, used in our algorithm, is only
                    
This is used to create the new generated data G̃1 . For negative corre-   positive semi-definite. We enforce positive-definiteness by repeated
lations, the process is unchanged.                                        addition of very small values to the diagonal of the correlation ma-
                                                                          trix until positive-definiteness is ensured. Given a symmetric and
3.2    Advice-guided data generation                                      positive definite matrix E, its Cholesky decomposition F is such
After Ĉ1 is constructed, we next generate data satisfying the con-       that E = F · F ⊤ .
straints. To this effect, we employ the Iman-Conover method [23],            [Step 4]: Calculate the Cholesky decomposition Q of the correla-
a distribution free method to define dependencies between distri-         tion matrix obtained after modifications based on human advice, Ĉ.
butional variables based on rank correlations such as Spearman or         As above the Cholesky decomposition is such that Ĉ = Q · Q ⊤ .
Kendell Tau correlations. Since we deal with linear relationships            [Step 5]: Calculate the reference matrix T by transforming the
between the features and assume a normal distribution and that            sampled matrix M from step 1 to have the desired correlations of Ĉ,
Pearson coefficient has shown to perform equally well with the            by using their Cholesky decompositions.
Iman-Conover method [40] due to the close relationship between               [Step 6]: Rearrange values in columns of the generated data G1
Pearson and Spearman correlations, we use the Pearson correlations.       to have the same ordering as corrresponding column in the reference
Further, we assume that the features are Gaussian, justified by the       matrix T to obtain the final generated data G̃1 .
fact that most lab test data is continuous. The Iman-Conover method
consists of the following steps:                                             Cholesky decomposition to model correlations: Given an ran-
   [Step 1]: Create a random standardized matrix M with values            domly generated data set with no correlations P, a correlation matrix
𝑥 ∈ M ∼ Gaussian distribution. This is obtained by the process of         C and its Cholesky decomposition Q, data that faithfully follows
inverse transform sampling described next. Let V1 be a uniformly          the given correlations ∈ C can be generated by the product of the
distributed random variable and CDF be the cumulative distribution        obtained lower triangular matrix with the original uncorrelated data
KiML’20, August 24, 2020, San Diego, California, USA,
                                                                                              Devendra Singh Dhami, Mayukh Das, and Sriraam Natarajan


i.e. P̂=QP. The correlation of the newly obtained data, P̂ is,                        to plan and prognosticate treatments. The data consists of 19
                                                                                      features with 44 positive and 6 negative examples.
                           𝐶𝑜𝑣 ( P̂)     E[ P̂ P̂ ⊤ ] − E[ P̂]E[ P̂] ⊤
           𝐶𝑜𝑟𝑟 ( P̂) =              =                                   (5)      (2) MIMIC database [24] consists of deidentified information
                             𝜎 P̂                    𝜎 P̂                             of patients admitted to critical care units at a large tertiary
Since we consider data P̂ from a Gaussian distribution with zero                      care hospital. The features included are predominately time
mean and unit variance,                                                               window aggregations of physiological measurements from
                                                                                      the medical records. We selected relevant lab results, vital
               E[ P̂ P̂ ⊤ ] − E[ P̂]E[ P̂] ⊤                                          sign observations and feature aggregations. The data consists
𝐶𝑜𝑟𝑟 ( P̂) =                                 = E[ P̂ P̂ ⊤ ] = E[(QP)(QP) ⊤ ]
                             𝜎 P̂                                                     of 18 with 5813 positive and 40707 negative examples.
            = E[QPQ ⊤ P ⊤ ] = QE[PP ⊤ ]Q ⊤ = QQ ⊤ = C                          Advice Acquisition: Here we compile the sources from which we
                                                                         (6)   obtain the advice.
Thus Cholesky decomposition can capture the desired correlations
                                                                                  (1) Nephrotic Syndrome: This is a novel real data set and the ad-
faithfully and can be used for generating correlated data. Since we al-
                                                                                      vice is obtained from a nephrologist in India. According
ready have a normal sampled matrix M and a calculated correlation
                                                                                      to the problem statement from the expert, nephrotic syndrome
E of M, we need to calculate a reference matrix (step 5).
                                                                                      involves the loss of a lot of protein and nephritic syndrome
                                                                                      involves the loss of a lot of blood through urine. A kidney
3.3      Human-Allied GAN training                                                    biopsy is often required to diagnose the underlying pathol-
Since the human expert advice is provided independent of the GAN                      ogy in patients with suspected glomerular disease. The goal
architecture, our method is agnostic of the underlying GAN architec-                  of the project is to build a clinical support system that pre-
ture. We make use of Wasserstein GAN (WGAN) architecture since                        dicts the disease using clinical features, thus reducing the
its shown to be more stable while training and can handle mode                        need of kidney biopsy. Since the data collection is scarce,
collapse [3]. Only the error backpropagation values differ when we                    a synthetic data set can help in better understanding of the
are using the data generated by the underlying GAN or the data                        disease from the clinical features.
generated by the Iman-Conover method. Our algorithm starts with                   (2) MIMIC: The feature set and the expected correlations are
the general process of training a GAN where the generator takes                       obtained in consultation with trauma experts at a Dallas
random noise as an input and generates data which is then passed,                     hospital.
along with the real data, to the discriminator. The discriminator
tries to identify the real and generated data and the error is back            All experiments were run on a 64-bit Intel(R) Xeon(R) CPU E5-2630
propagated to the generator. After every specified number of itera-            v3 server for 10K epochs. Both the generator and discriminator are
tions, the correlations between features C in the generated data is            neural networks with 4 hidden layers. To measure the quality of the
obtained and a new correlation matrix Ĉ, is obtained with respect             generated data we make use of the train on synthetic, test on real
to the expert advice (section 3.1). A new data set is generated wrt            (TSTR) method as proposed in [12]. We use gradient boosting with
Ĉ using the Iman-Conover method (Section 3.2) and then passed to              100 estimators and a learning rate of 0.01 as the underlying model.
the discriminator along with the real data set.                                We train the GAN for 10K epochs and provide correlation advice
                                                                               every 1K iterations.
4     EXPERIMENTAL EVALUATION                                                  Table 1 shows the results of the TSTR method with data generated
                                                                               with (HA-GAN𝐺𝐴 ) and without advice (GAN). It shows that the
We aim to answer the following questions:
                                                                               data generated with advice has higher TSTR performance than the
    Q1: Does providing advice to GANs help in generating better                data generated without advice across all data sets and all metrics.
        quality data?                                                          Thus, to answer Q1, providing advice to generative adversarial net-
    Q2: Are GANs with advice effective for data sets that have few             works captures the relationship between features better and thus are
        examples?                                                              able to generate better quality synthetic data.
    Q3: How does bad advice affect the quality of generated data?              Learning with less data: GANs with advice are especially impres-
    Q4: How well does human advice handle class imbalance?                     sive in nephrotic syndrome data which consists of only 50 examples
    Q5: How does our method compare to state-of-the-art GAN archi-             across all metrics and is thus very small in size when compared to the
        tectures.                                                              number of samples typically required to train a GAN model. Thus,
     We consider 2 real clinical data sets.                                    we realize an important property of incorporating human guidance in
     (1) Nephrotic Syndrome is a novel data set of symptoms that               the GAN model and can answer Q2 affirmatively. The use of advice
         indicate kidney damage. This consists of 50 kidney biopsy             opens up the potential of using GANs in presence of sparse data
         images along with the clinical reports sourced from Dr Lal            samples.
         PathLabs, India 2 . We use the clinical reports that consist of       Effect of bad advice: Table 1 also shows the results for data gen-
         the values for kidney tissue diagnosis which can confirm the          erated with bad advice (HA-GAN𝐵𝐴 ). To simulate bad advice, we
         clinical diagnosis and help to identify high-risk patients and        follow a simple process: if the advice says that the correlation be-
         influence treatment decisions and help medical practitioners          tween features should be high, we set the correlations in Ĉ to 0
                                                                               and if the advice says that the correlation should be low, we set the
2 https://www.lalpathlabs.com/                                                 correlations in Ĉ to be either 1 or -1 based on whether the original
Knowledge Intensive Learning of                                                                               KiML’20, August 24, 2020, San Diego, California, USA,
Generative Adversarial Networks


Table 1: TSTR Results (≈ 3 𝑑𝑒𝑐.). N/A in Nephrotic Syndrome denotes that all generated labels were of a single class (0 in our case)
and thus we were not able to run the discriminative algorithm in the TSTR method. 𝐺𝐴 and 𝐵𝐴 denotes good and bad advice to our
HA-GAN model respectively.

                                                Data set      Methods      Recall    F1     AUC-ROC        AUC-PR
                                                               GAN         0.584    0.666     0.509         0.911
                                                             HA-GAN𝐵𝐴       0.42    0.511     0.518         0.886
                                                              medGAN        N/A      N/A       N/A           N/A
                                                   NS
                                                             medWGAN        N/A      N/A       N/A           N/A
                                                             medBGAN        N/A      N/A       N/A           N/A
                                                             HA-GAN𝐺𝐴        1.0    0.943     0.566         0.947
                                                               GAN         0.122    0.119     0.495         0.174
                                                             HA-GAN𝐵𝐴      0.285    0.143     0.459         0.235
                                                              medGAN       0.374    0.163     0.478         0.279
                                                 MIMIC
                                                             medWGAN         0.0     0.0       0.5          0.562
                                                             medBGAN         0.0     0.0       0.5          0.562
                                                             HA-GAN𝐺𝐴      0.979    0.263     0.598         0.567


correlation is positive or negative. Thus, given a correlation matrix                in table 1 where advice based data generation outperforms the non-
                            1                                                       advice and bad advice based data generation. Thus, we can answer
                                   0.2    0.3 
                                                                                    Q4 affirmatively.
                       C = 0.2    1     0.07                  (7)
                            0.3 0.07                                                To answer Q5 we compare our method to 3 GAN architectures,
                                           1 
                                                                                    medGAN [8] which uses an encoder decoder framework for EHR
suppose the advice says that we need to increase the correlation                     data generation and its 2 variants medBGAN and medWGAN [4]
coefficient between feature 1 and feature 2. Then the new correlation                and the results are shown in table 1. Our method, with good advice,
matrix after bad advice can be calculated as:                                        outperforms the baseline both domains showing the effectiveness of
                   1     0.2    0.3       1 𝜆 1                                 our method.
                                                     
              C = 0.2    1     0.07 A = 𝜆 1 1             (8)
                   0.3 0.07      1          1 1 1
                   
                                      
                                                                                  5      CONCLUSION
                           1       0.2       0.3  1     𝜆   1                 We presented a new GAN formulation that employs correlation
                                                                                     information between features as advice to generate new correlated
                           
             Ĉ = C ⊙ A = 0.2      1        0.07 ⊙ 𝜆   1   1      (9)
                           0.3     0.07       1  1      1   1                 data and train the underlying GAN model. We tested our model
                                                                                     on real clinical data sets and show that incorporating advice helps
                           
where 𝜆 is the factor by which the correlation value is to be aug-
                                                                                     generate good quality synthetic medical data. We employ TSTR
mented. Since the advice asks to increase the correlation, we set 𝜆=0.
                                                                                     method to test the quality of generated data and demonstrated that
Thus,
                                                                                     the generated data with advice is more aligned with the real data.
       1      0.2     0.3  1 0 1  1         0.0    0.3                    There are several future interesting directions. First, providing advice
       
  Ĉ = 
        0.2     1     0.07 ⊙
                               0  1   1  =
                                             0.0    1    0.07   (10)
                                                                                    only when required in an active fashion can allow for significant
       0.3 0.07        1    1 1 1 0.3 0.07              1                      reduction in the amount of effort on the human side. Second, there
                                                            
    Similarly, if the advice says that we need to decrease the correla-              can be multiple advice options, such as posterior regularization [15],
tion coefficient between feature 1 and feature 3, we set 𝜆 = 𝑓 𝑒𝑎𝑡1 .                that can be used to capture feature relationships explicitly. Third,
                                                                          𝑣𝑎𝑙        although we do not have identifiers in the data, thereby eliminating
      1       0.2    0.3   1       0.2      1      1      0.2    1.0        the need of differential privacy [11], a general framework that can
                                               0.3     
Ĉ = 0.2      1     0.07 ⊙  0.2    1       1  = 0.2     1     0.07       uphold the privacy of patient data along the lines of using Cholesky
      0.3     0.07    1   0.3 1     1       1  1.0      0.07    1         decomposition [7, 31] is a natural next step.
      
                                                                          (11)
                                                                                     ACKNOWLEDGMENTS
As results show in table 1, giving bad advice adversely affects the
                                                                                     DSD and SN gratefully acknowledge DARPA Minerva award FA9550-
performance thereby answering Q3.
                                                                                     19-1-0391. Any opinions, findings, and conclusion or recommenda-
The nephrotic syndrome and MIMIC data sets are relatively unbal-
                                                                                     tions expressed in this material are those of the authors and do not
anced with a pos to neg ratio of ≈ 8:1 and 1:7 respectively. Most
                                                                                     necessarily reflect the view of the DARPA or the US government.
of the medical data sets, except highly curated data sets, are un-
balanced. A data generator model should be able to handle this
imbalance. Since our method explicitly focuses on the correlations                   REFERENCES
between features and generates better quality data based on such                      [1] Grigory Antipov, Moez Baccouche, and Jean-Luc Dugelay. 2017. Face aging with
                                                                                          conditional generative adversarial networks. In ICIP.
relationships between features, our method is quite robust to the                     [2] Martin Arjovsky and Leon Bottou. 2017. Towards principled methods for training
imbalance in the underlying data. This can be seen in the results                         generative adversarial networks. In ICLR.
KiML’20, August 24, 2020, San Diego, California, USA,
                                                                                                            Devendra Singh Dhami, Mayukh Das, and Sriraam Natarajan


 [3] Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein gan.           [36] V. K. Mansinghka, C. Kemp, J. B. Tenenbaum, and T. L. Griffiths. 2006. Structured
     ICML (2017).                                                                              Priors for Structure Learning. In UAI.
 [4] Mrinal Kanti Baowaly, Chia-Ching Lin, Chao-Lin Liu, and Kuan-Ta Chen. 2019.          [37] Xudong Mao, Qing Li, Haoran Xie, Raymond YK Lau, Zhen Wang, and Stephen
     Synthesizing electronic health records using improved generative adversarial              Paul Smolley. 2017. Least squares generative adversarial networks. In ICCV.
     networks. JAMA (2019).                                                               [38] Tom M Mitchell. 1980. The need for biases in learning generalizations. Depart-
 [5] Darius Braziunas and Craig Boutilier. 2006. Preference elicitation and generalized        ment of Computer Science, Laboratory for Computer Science Research, Rutgers
     additive utility. In AAAI.                                                                Univ. New Jersey.
 [6] Anna L Buczak, Steven Babin, and Linda Moniz. 2010. Data-driven approach             [39] Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. 2018.
     for creating synthetic electronic medical records. BMC medical informatics and            Spectral normalization for generative adversarial networks. ICLR (2018).
     decision making (2010).                                                              [40] Klemen Naveršnik and Klemen Rojnik. 2012. Handling input correlations in
 [7] Jim Burridge. 2003. Information preserving statistical obfuscation. Statistics and        pharmacoeconomic models. Value in Health (2012).
     Computing (2003).                                                                    [41] P. Odom, T. Khot, R. Porter, and S. Natarajan. 2015. Knowledge-Based Proba-
 [8] Edward Choi, Siddharth Biswal, Bradley Malin, Jon Duke, Walter F Stewart,                 bilistic Logic Learning. In AAAI.
     and Jimeng Sun. 2017. Generating Multi-label Discrete Patient Records using          [42] Phillip Odom and Sriraam Natarajan. 2015. Active advice seeking for inverse
     Generative Adversarial Networks. In MLHC.                                                 reinforcement learning. In AAAI.
 [9] Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine           [43] Phillip Odom and Sriraam Natarajan. 2018. Human-guided learning for proba-
     Learning (1995).                                                                          bilistic logic models. Frontiers in Robotics and AI (2018).
[10] Ivo D Dinov. 2016. Volume and value of big healthcare data. Journal of medical       [44] Michela Paganini, Luke de Oliveira, and Benjamin Nachman. 2018. Calo-
     statistics and informatics (2016).                                                        GAN: Simulating 3D high energy particle showers in multilayer electromagnetic
[11] Cynthia Dwork. 2008. Differential privacy: A survey of results. In TAMS.                  calorimeters with generative adversarial networks. Physical Review D (2018).
[12] Cristóbal Esteban, Stephanie L Hyland, and Gunnar Rätsch. 2017. Real-valued          [45] Alec Radford, Luke Metz, and Soumith Chintala. 2016. Unsupervised represen-
     (medical) time series generation with recurrent conditional gans. arXiv preprint          tation learning with deep convolutional generative adversarial networks. ICLR
     arXiv:1706.02633 (2017).                                                                  (2016).
[13] Maayan Frid-Adar, Eyal Klang, Michal Amitai, Jacob Goldberger, and Hayit             [46] Ernest M Scheuer and David S Stoller. 1962. On the generation of normal random
     Greenspan. 2018. Synthetic data augmentation using GAN for improved liver                 vectors. Technometrics (1962).
     lesion classification. In ISBI.                                                      [47] Bernhard Schölkopf, Patrice Simard, Alex J Smola, and Vladimir Vapnik. 1998.
[14] Glenn M Fung, Olvi L Mangasarian, and Jude W Shavlik. 2003. Knowledge-based               Prior knowledge in support vector kernels. In Advances in neural information
     support vector machine classifiers. In NIPS.                                              processing systems. 640–646.
[15] Kuzman Ganchev, Jennifer Gillenwater, Ben Taskar, et al. 2010. Posterior regular-    [48] Rittika Shamsuddin, Barbara M Maweu, Ming Li, and Balakrishnan Prabhakaran.
     ization for structured latent variable models. JMLR (2010).                               2018. Virtual patient model: an approach for generating synthetic healthcare time
[16] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep learning.                  series data. In ICHI.
[17] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley,        [49] Jude W Shavlik and Geoffrey G Towell. 1989. Combining explanation-based
     Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial           learning and artificial neural networks. In Proceedings of the sixth international
     nets. In NIPS.                                                                            workshop on Machine learning. Elsevier.
[18] Peter Groves, Basel Kayyali, David Knott, and Steve Van Kuiken. 2016. The’big        [50] Geoffrey G Towell and Jude W Shavlik. 1994. Knowledge-based artificial neural
     data’revolution in healthcare: Accelerating value and innovation. (2016).                 networks. Artificial intelligence (1994).
[19] John T Guibas, Tejpal S Virdi, and Peter S Li. 2017. Synthetic medical images        [51] Yan Wang, Biting Yu, Lei Wang, Chen Zu, David S Lalush, Weili Lin, Xi Wu, Jiliu
     from dual generative adversarial networks. arXiv preprint arXiv:1709.01872                Zhou, Dinggang Shen, and Luping Zhou. 2018. 3D conditional generative adver-
     (2017).                                                                                   sarial networks for high-quality PET image estimation at low dose. NeuroImage
[20] Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C             (2018).
     Courville. 2017. Improved training of wasserstein gans. In NIPS.                     [52] Zongwei Wang, Xu Tang, Weixin Luo, and Shenghua Gao. 2018. Face aging with
[21] Haroun Habeeb, Ankit Anand, Mausam Mausam, and Parag Singla. 2017. Coarse-                identity-preserved conditional generative adversarial networks. In CVPR.
     to-fine lifted MAP inference in computer vision. In IJCAI.                           [53] Jiajun Wu, Chengkai Zhang, Tianfan Xue, Bill Freeman, and Josh Tenenbaum.
[22] Zhiting Hu, Zichao Yang, Russ R Salakhutdinov, LIANHUI Qin, Xiaodan Liang,                2016. Learning a probabilistic latent space of object shapes via 3d generative-
     Haoye Dong, and Eric P Xing. 2018. Deep Generative Models with Learnable                  adversarial modeling. In NIPS.
     Knowledge Constraints. In NeurIPS.                                                   [54] S. Yang and S. Natarajan. 2013. Knowledge Intensive Learning: Combining
[23] Ronald L Iman and William-Jay Conover. 1982. A distribution-free approach to              Qualitative Constraints with Causal Independence for Parameter Learning in
     inducing rank correlation among input variables. Communications in Statistics-            Probabilistic Models. In ECMLPKDD.
     Simulation and Computation (1982).                                                   [55] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired
[24] Alistair EW Johnson, Tom J Pollard, Lu Shen, H Lehman Li-wei, Mengling Feng,              image-to-image translation using cycle-consistent adversarial networks. In ICCV.
     Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi,
     and Roger G Mark. 2016. MIMIC-III, a freely accessible critical care database.
     Scientific data (2016).
[25] Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator archi-
     tecture for generative adversarial networks. In CVPR.
[26] Harold William Kuhn and Albert William Tucker. 1953. Contributions to the
     Theory of Games.
[27] Gautam Kunapuli, Phillip Odom, Jude W Shavlik, and Sriraam Natarajan. 2013.
     Guiding autonomous agents to better behaviors through human advice. In ICDM.
[28] Quoc V Le, Alex J Smola, and Thomas Gärtner. 2006. Simpler knowledge-based
     support vector machines. In ICML.
[29] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature
     (2015).
[30] Minjun Li, Haozhi Huang, Lin Ma, Wei Liu, Tong Zhang, and Yugang Jiang.
     2018. Unsupervised image-to-image translation with stacked cycle-consistent
     adversarial networks. In ECCV.
[31] Yaping Li, Minghua Chen, Qiwei Li, and Wei Zhang. 2011. Enabling multilevel
     trust in privacy preserving data mining. TKDE (2011).
[32] Yujia Li, Kevin Swersky, and Rich Zemel. 2015. Generative moment matching
     networks. In ICML.
[33] Ming-Yu Liu, Thomas Breuel, and Jan Kautz. 2017. Unsupervised image-to-image
     translation networks. In NIPS.
[34] Ming-Yu Liu and Oncel Tuzel. 2016. Coupled generative adversarial networks. In
     NIPS.
[35] Faisal Mahmood, Richard Chen, and Nicholas J Durr. 2018. Unsupervised reverse
     domain adaptation for synthetic medical images via adversarial training. IEEE
     transactions on medical imaging (2018).