=Paper= {{Paper |id=Vol-2964/article_59 |storemode=property |title=TextureVAE: Learning Interpretable Representations of Material Microstructures Using Variational Autoencoders |pdfUrl=https://ceur-ws.org/Vol-2964/article_59.pdf |volume=Vol-2964 |authors=Avadhut Sardeshmukh,Sreedhar Reddy,Bp Gautham,Pushpak Bhattacharyya |dblpUrl=https://dblp.org/rec/conf/aaaiss/SardeshmukhRPB21 }} ==TextureVAE: Learning Interpretable Representations of Material Microstructures Using Variational Autoencoders== https://ceur-ws.org/Vol-2964/article_59.pdf
             TextureVAE : Learning Interpretable Representations of Material
                     Microstructures Using Variational Autoencoders




                            Abstract
  We propose a variational autoencoder model based on style
  loss for learning representations of material microstructure
  images. We show using latent space traversals that the model
  captures important attributes of microstructures that are re-
  sponsible for mechanical properties of materials and is ca-                  (a) Cast iron           (b) Ultra-high carbon steel
  pable of generating microstructures with particular attributes.
  We discuss how the latent vectors can be used to establish a       Figure 1: Example Microstructures - The information of in-
  linkage between structure and properties and enable inverse        terest varies with the material system under consideration.
  inference which is crucial for designing materials and prod-       (Credit: (a) Tewary et al. [to be published], (b) (DeCost et al.
  ucts with target properties.                                       2017)

                     1    Introduction
When a material is put through a manufacturing process,              a random vector of length r has both its ends in the same
it’s internal structure is modified, which in turn affects the       phase. These are the most widely used correlation functions
properties of the material. Materials scientists seek answers        for formal mathematical characterization of microstructures.
to questions such as what processing is required to achieve          However, it has been shown that different microstructures
the target properties and how do the structure and properties        may, under some conditions, lead to similar 2-point corre-
change with the process parameters. It is well known that            lations (Cang et al. 2018). And beyond n = 2 (i.e., higher-
these mappings are complex, highly non-linear and multiple           order spatial correlations), n-point statistics quickly become
processing paths can lead to the same property. These map-           intractable. Hence, these methods are not easily extensible
pings can be best described through the space of structures          in general.
(Kalidindi 2015). The most commonly available description               Another common alternative is using physical descriptors.
of the structure is in the form of microscopic images, which         For example, consider the two microstructures in Figure 1.
is known as the microstructure (because the length scale is          The first one is a cast iron microstructure containing spher-
roughly 10−6 m). Obtaining compact representations of mi-            ical grains of graphite, and the ferrite phase in background.
crostructure images 1 is therefore crucial for building robust       Here, some physical descriptors of interest are the sizes of
process-structure-property linkages.                                 the spherical grains, their density and so on. Whereas the
   The microstructure contains a lot of information such as          second one is an ultra-high carbon steel microstructure con-
grain size distribution, volume fractions of different phases        taining pearlite phase as an alternating layer with the ferrite
and so on. Depending upon the material system under con-             phase. Here the important physical descriptors are the ori-
sideration, very different types of features and information         entation of the lamellar pattern, the inter-lamellar spacing
is relevant. Traditionally, materials scientists have used sta-      and so on. One needs to employ specific image processing
tistical methods such as n-point correlation functions and           to extract physical descriptors from given microstructures.
Gaussian random fields for obtaining representations of mi-          Also, such a representation is specific to the material system
crostructures. The n-point correlation functions capture the         and is tightly tied to the expertise of a materials scientist in
degree of spatial correlation among the locations and con-           selecting the right descriptors.
stituents in a probabilistic sense (Kalidindi 2015). For ex-            Researchers are looking at the recent methods of repre-
ample, given a microstructure containing two phases, the 2-          sentation learning from the deep learning literature as an al-
point statistics can be used to encode the probability that          ternative to the above-mentioned microstructure representa-
Copyright c 2021for this paper by its authors. Use permitted un-     tion methods. A major challenge in leveraging these tech-
der Creative Commons License Attribution 4.0 International (CC       niques is the scarcity of data. Transfer learning can be used
BY 4.0)                                                              to mitigate this up to some extent. However, a further chal-
   1
     For simplicity, henceforth referred to as just microstructure   lenge is that it is even more difficult to get microstructures
with associated property values, owing to the high cost and       further work. Liu et al. (Liu et al. 2015) propose a design
time required to do the testing. Deep generative models such      method for inferring structures with target properties using
as the variational autoencoders aim to learn good latent rep-     Bayesian optimization around the GAN generator. The au-
resentations in an unsupervised manner. In absence of a su-       thors talk about the possibility of using the implicitly learned
pervisory signal, these models try to learn latent representa-    representation from the GAN discriminator for a structure-
tions from which the original inputs can be reconstructed         property model, but do not present any study on this. Proba-
most accurately. Such a representation is expected to en-         bly the closest to our work is Cang et al. (Cang et al. 2018),
code all non-redundant information from the image. Fur-           where a variational autoencoder model with style loss is pro-
ther, these models are capable of synthesizing images that        posed for generating microstructures of sandstone. The au-
are realistic and statistically equivalent to the training im-    thors show that the generated microstructures are more pre-
ages. Synthesis is a common goal to support computational         dictive of the properties (Young’s modulus, diffusivity and
design because the cost and difficulty of experimental char-      permeability) than those generated using Gaussian random
acterization is often prohibitively high (Hsu et al. 2020). In-   field method. They add the style loss to the vanilla VAE ob-
spired by this, we propose a variational autoencoder archi-       jective function, retaining the original reconstruction loss.
tecture to learn low-dimensional microstructure representa-       However, the vanilla VAE reconstruction loss is not suitable
tions. We demonstrate that the learned latent representation      for microstructure images (see section 3.2 for more details).
indeed encodes important features, with a use-case in which       So we completely replace the reconstruction loss with the
such features are known in advance. The representation can        style loss. We also show that physically significant factors
be physically interpreted in that individual latent dimensions    of variation are explicitly encoded in the learned representa-
correspond to different features which are known to be im-        tion. To the best of our knowledge, ours is the first work on a
portant from the physics knowledge. Such a representation is      variational autoencoder model for microstructure generation
therefore expected to work well for modeling the structure-       and interpretable representation learning.
property linkages.
   Our key contribution is an interpretable microstructure                            3    Methodology
representation method that                                        3.1   Variational Autoencoders
• captures physically significant factors of variation which      Variational Autoencoders (Kingma and Welling 2013) are
  are primarily responsible for the mechanical properties of      typically used to learn latent representations of input sam-
  the material                                                    ples in an unsupervised manner. The underlying graphical
• can be used to generate different microstructures by vary-      model is pθ (x, z) = pθ (z)pθ (x|z), where x is the observed
  ing these factors                                               variable (input sample) and z is latent variable (the represen-
                                                                  tation). Given an input sample, the latent variables can be in-
                    2   Related work                              ferred from the posterior p(z|x). Computing this distribution
                                                                  is a hard problem due to the intractable partition function re-
With the recent advances in machine learning, there is a re-      quired in applying Bayes’ theorem. In variational inference,
newed interest among materials scientists to leverage these       an approximate posterior distribution qφ (z|x) from a known
advances for material microstructure modeling. Bostanabad         family is found by minimizing the KL divergence from the
et al. (Bostanabad et al. 2018) provides a detailed review of     true posterior. That is, find q ∗ such that
the state-of-the-art in computational characterization of ma-
terial microstructure. We discuss some of the more recent                   q ∗ = argminqφ DKL (qφ (z|x)kpθ (z|x))
works on application of deep learning for this task.              However, computing the KL divergence also involves the
   In some recent works, generative adversarial nets (GANs)       same intractable integrals as in posterior computation. So
and variational autoencoders have been used for material          instead of minimizing it, another tractable quantity called
microstructure generation. Often the focus is on generation       the Evidence Lower Bound (ELBO) derived from the above
rather than representation learning. For example, (Banko          equation is maximized and it is shown that maximizing
et al. 2020) use a conditional GAN to generate microstruc-        ELBO is equivalent to minimizing the KL divergence. The
tures of thin films conditioned on process parameters and         ELBO is defined as:
chemical composition. While Hsu et al. (Hsu et al. 2020)
use GANs to generate small patches of 3D microstructure of         L = −DKL (qφ (z|x)kpθ (z)) + Eqφ (z|x) [log pθ (x|z)] (1)
solid oxide fuel cell anodes. They show that the properties       The above loss function can be roughly understood as fol-
computed by numerical simulations on the generated mi-            lows : The second term is the expected log-likelihood of
crostructures closely match the experimental observations.        getting back the same x starting with the inferred z from the
Chun et al. (Chun et al. 2020) use a patch-based GAN to           approximate posterior qφ (z|x). It is often called the recon-
generate microstructures of heterogeneous energetic materi-       struction loss. The first term is a regularizer that penalizes
als (propellants explosives and pyrotechnics). The input to       posteriors very different from the prior.
their model consists of a pair of vectors for each grid loca-        The prior pθ (z) and the posterior qφ (z|x) are generally
tion (patch). They show that during generation, the two vec-      assumed to be Gaussian. The distribution qφ is parameter-
tors can be used to control overall morphology. However the       ized by an inference network and resembles an encoder. It
intuitive meanings of individual dimensions of these vec-         outputs the µ and σ of the posterior for a given input sam-
tors are not clear and the authors point to this as possible      ple (i.e. qφ (z|xi ) = N (z; µxi , σxi )). The distribution pθ is
parameterized by a generator network and resembles a de-             a major improvement over state-of-the-art in texture syn-
coder. It outputs a sample from the distribution pθ given a          thesis, it required an optimization step for each genera-
latent vector z.                                                     tion. This method has been extended to feed-forward ap-
                                                                     proaches that learn a separate generator network to mini-
3.2   Texture-VAE                                                    mize the style loss, for example (Johnson, Alahi, and Fei-
However, the reconstruction loss in the original VAE objec-          Fei 2016), (Ulyanov, Vedaldi, and Lempitsky 2017) and
tive is not suitable to model the mismatch between a mi-             (Li et al. 2017). The generator transforms a random in-
crostructure image and its reconstruction. We discuss the            put vector (typically a standard Gaussian) into texture im-
reasons as follows. In the decoder network, the reconstruc-          ages. However, these methods are not well suited for repre-
tion x̂ is a deterministic function of z. Now consider the re-       sentation learning because they focus on generation. Most
construction loss : Eqφ (z|x) [log pθ (x|z)]. With continuous        of these methods require learning one network per texture
valued outputs such as images, the generator distribution is         (or per style). The difficulty in using the same network for
generally assumed to be Gaussian whose mean µ = f (z; θ)             multiple styles seems to stem from the fact that the Gram
is computed by the generator network. Thus, µ = x̂. Since            matrices of different styles have very different scales and
the reconstruction loss is an expectation of log of a Gaus-          the generator adapts itself to a particular style. There have
sian density, it is equivalent to (x − µ)2 (i.e. (x − x̂)2 ) up to   been some works on methods capable of learning multiple
some constants. Hence the reconstruction loss is equivalent          styles/textures with a single model and they seem to focus
to pixel-wise mean squared error. It has been argued in many         on normalizing the styles and forcing a correlation between
works that pixel-wise comparison is not capable of captur-           the random input and generated image. Variational autoen-
ing perceptual image similarity (see for example (Ding et al.        coders naturally address this issue with the encoder learning
2020), (Dosovitskiy and Brox 2016) or (Larsen et al. 2016)).         different representations for different styles. Lastly, while
This is especially true in case of microstructures, which are        the concatenation of Gram matrices characterizes a texture
a type of texture images (they contain randomly repeating            well, it may not be of much use as a representation vector
patterns such as spheres, lines and so on). Imagine a stripes        by itself for other downstream tasks because it is very high
pattern and another one, shifted one stripe right. A human           dimensional (often, more dimensions than the image itself).
instantly understands that they are essentially “the same tex-          Combining these ideas, we propose using the style loss to
ture”, but the pixel-wise difference could be huge. To sum-          train a variational autoencoder. In particular we replace the
marize, the reconstruction loss term in equation 1 is not suit-      reconstruction loss (i.e. the second term) in equation 1 with
able for texture images since it leads to a pixel-by-pixel com-      the style loss (given in equation 2). We keep the first term
parison between input and reconstructed image. This moti-            as it is - this regularizes the latent representation by forc-
vates replacing the reconstruction loss with a better suited         ing all posteriors to be not far from the prior. With a stan-
measure for textures.                                                dard Gaussian prior with diagonal covariance matrix, this
   Since textures are different from natural images that gen-        term encourages a representation with statistically indepen-
erally contain objects, special considerations are needed for        dent dimensions, which is expected to be interpretable. This
representing textures. In texture synthesis literature, Gatys        is discussed in more detail in section 4.3. We call our model
et al. (Gatys, Ecker, and Bethge 2015) proposed to use the           the texture-VAE. We show that a single VAE model trained
feature correlations computed from different layers of a pre-        with style loss can be used for multiple textures.
trained network (e.g. VGG19) to represent textures. The fea-
ture correlations at layer l are encoded by the Gram ma-
trix G(l) , whose elements are inner products between fea-           3.3   Model architecture
ture maps at that layer. If layer l has Cl feature maps of size
Wl × Hl then:                                                        The architecture of our VAE model is shown in Figure 2.
                        Glij = Σk Fikl  l
                                       Fjk                           We use the pre-trained VGG19 network (Simonyan and Zis-
                                                                     serman 2014) for computing the style loss and as the en-
Where, Ml = Wl ∗ Hl and F l is the Cl × Ml matrix with the           coder. The last two fully connected layers of the encoder that
flattened feature maps as rows. A texture can then be repre-         compute µ and σ of the posterior are trained from scratch.
sented by the concatenation of all Gram matrices. Given an           For the remaining layers, the pre-trained weights are used
input texture image, the authors propose a method to gen-            without fine-tuning. The decoder contains four blocks of de-
erate similar textures by starting with a random white noise         convolution followed by nearest-neighbor upsampling and
image and minimizing the squared difference between the              LeakyReLU nonlinearity with slope 0.3. It has been previ-
representations. Note that the optimization is with respect to       ously observed in literature that for generation, explicit up-
the image pixels; the weights of the pre-trained network are         sampling works better than using fractional strided convo-
not changed. They call the squared difference between Gram           lutions (Odena, Dumoulin, and Olah 2016). We use a filter
matrix concatenations as the “style loss”:                           size of 3 × 3 throughout.
                                                                        For computing the style loss, we use the pooling layer
                                                       
                    L
                    X           1    X
                                                                     at each scale, i.e. pool 1 to pool 4 and conv 1 1. Gatys et
   Lstyle (x, x̂) =    wl  2 2          (Gl − Ĝlij )2  (2)
                            4Cl Ml i,j ij                            al. (Gatys, Ecker, and Bethge 2015) recommend using the
                   l=0
                                                                     convolutional layers instead of pooling. But we found that
While this approach of “generation-by-optimization” was              in our case, pooling layers worked better.
                                                                           (a)       (b)       (c)        (d)       (e)    (f)

                                                                   Figure 3: Example patch from each microstructure. (a) small
                                                                   dense spheres, (b) small sparse spheres, (c) large spheres, (d)
                                                                   intermediate (e) fine flakes and (f) thick flakes

                                                                   Table 1: Average similarity between reconstructed and orig-
                                                                   inal patches

                                                                                                           Score
                                                                                  Latent Dims
                                                                                                     DISTS STSIM-2
                                                                                  64                 0.7031    0.7592
Figure 2: The variational autoencoder model with style loss.                      32                 0.7257    0.7640
Lstyle is the sum of squared differences between Gram ma-                         16                 0.6995    0.7609
trices at all layers and denotes the inner product of feature                     8                  0.6962    0.7591
maps (F l )T F l
                                                                     The original microstructures were 2048x1532. We use
                    4    Experiments                               patches of 128x128 cropped by sliding a window with stride
                                                                   50 for our training.
4.1   Dataset
We use microstructures of cast iron to demonstrate the ca-         4.2      Evaluation
pabilities of our texture-VAE model. These microstructures         While there are many metrics of perceptual image similar-
were produced in a separate study by Ujjal Tewary et al. (to       ity, very few of them are focused on texture images. Two
be published) of microstructure and mechanical properties          recently proposed metrics of texture similarity that seem to
of cast iron produced using sand casting. In the process of        be best suited for our evaluation are - i) Deep Image Struc-
sand casting, molten iron ore mixed with C, Si, Mg etc. is         ture and Texture Similarity - DISTS (Ding et al. 2020) and
poured into molds of desired size and kept in sand for cool-       ii) Structural Texture Similarity - STSIM (Ehmann, Pappas,
ing. In case of cylindrical molds, the sample cools from the       and Neuhoff 2013). The DISTS score consists of two terms -
surface to its core, so the radius governs the cooling rate.       first one compares the means of features maps (from a vari-
In the study, cylindrical castings of various radii (resulting     ant of pre-trained VGG) and the second one computes cross
in different cooling rates) were made using sand casting.          covariance between them. The two terms are combined us-
Microstructures of these cylindrical castings were then cap-       ing weights tuned to match human judgments and be invari-
tured using a scanning electron microscope. The original           ant to re-sampled patches from the same texture. The STSIM
study consisted of many experiments but we describe here           metric is based on a similar modification of Structural Sim-
only those that correspond to the microstructures we used.         ilarity Metric (SSIM) to completely avoid pixel-by-pixel
   We use the microstructures of 12 samples resulting from         comparison, but is computed in the Fourier spectrum. We
combination of four cooling rates corresponding to cylin-          used the authors’ implementation of DISTS2 . STSIM has
ders of radius 12, 24, 36 and 48mm and three compositions          several configurations and we compute the STSIM-2 metric
(mainly varying Magnesium - 0, 0.025 and 0.045% weight).           using a publicly available implementation3 . Table 1 shows
The images were captured at 100µm length scale, without            the average similarity over 100 instances between original
any etching. The microstructures mainly contain ferrite and        patches and reconstructions from texture-VAE models with
graphite phases. The cooling rate and initial composition af-      number of latent vector dimensions 8, 16, 32 and 64. The
fect the grain size, density and morphology (i.e. appearance       similarity scores are on a scale of 0 to 1, 1 being the high-
of the graphite - spherical, flaky or both) of the resulting mi-   est. The similarity seems to slightly increase with increas-
crostructure. Figure 3 shows small 128x128 patches from a          ing number of latent dimensions, but beyond 64 it didn’t in-
few microstructures. We have chosen these so that the vari-        crease, so we stopped there.
ations in grain size (small to large), density (low to high)          One way to qualitatively evaluate a VAE model is to look
and morphology (spherical, flakes and intermediate) can be         at the reconstructed and newly generated samples. Figure 4
clearly seen.                                                      shows some example reconstructions while Figure 5 shows
   Out of the 12, 6 samples corresponding to the lowest and        some randomly generated samples from texture-VAE model
highest cooling rates were also subjected to uni-axial com-        with 64 latent dimensions. Although the latent-32 model
pression to obtain their stress-strain behavior. We used all 12    had slightly higher similarity scores for reconstructions than
microstructures to train the texture-VAE model while the 6
                                                                      2
with property values were used for property prediction task               https://github.com/dingkeyan93/DISTS
                                                                      3
described in section 4.4.                                                 https://github.com/andreydung/Steerable-filter
                                                                    z[17]

                                                                    z[23]
  (a) Small spheres    (b) Large spheres     (c) Intermediate
                                                                    z[26]

                                                                    z[34]
   (d) Intermediate      (e) Fine flakes     (f) Thick flakes
                                                                                  (a) Starting with large spheres
              Figure 4: Example reconstructions                     z[17]

                                                                    z[23]

                                                                    z[26]
      (a)        (b)           (c)         (d)          (e)
                                                                    z[34]
            Figure 5: Randomly generated samples                                    (b) Starting with fine flakes

                                                                   Figure 6: Effect of varying latent dimensions, starting with
latent-64, the latter recovered finer details better, which are    different morphologies. Each row corresponds to one latent
important from the domain perspective (for example the             dimension. The leftmost image is the original patch and the
ferrite grain boundaries, explained in the next paragraph).        remaining are variations obtained by varying that particular
Hence we did all further experimentation with the latent-64        latent dimension
model. The left half of each image in Figure 4 is a patch
from the original microstructure - x, while the right half
is the reconstructed image - x̂ = Dec(Enc(x)). We have             tion since they seem to produce physically significant vari-
shown some representative examples from each microstruc-           ations that are visually discernible as well. Figure 6a shows
ture. It can be seen that even the minute structural details       the variations starting with a large-spheres microstructure,
such as the ferrite grain boundaries (the thin lines in the gray   whereas Figure 6b shows the variations starting with a fine-
portion) which are faintly visible only in the case of large       flakes microstructure. From the figure, dimension 17 seems
spherical grains in Figure 4b are reconstructed quite well.        to correspond to morphology, with lower values indicating
   These results show that the texture-VAE model is capable        flaky and higher values indicating spherical structures. Di-
of reconstructing input samples quite well across different        mensions 23 and 26 seem to correspond to the density and
textures. The randomly generated samples also span differ-         size respectively of spherical grains, with their values in-
ent textures and look structurally similar to the original ones.   creasing as we go from left to right. Whereas dimension
                                                                   34 seems to correspond to the density of flakes. From the
4.3    Interpretability                                            physics of Cast iron microstructures, it is known that grain
Variational autoencoders have been shown to recover factors        size and density are correlated - when the spherical grains
of variation in the training data ((Kingma and Welling 2013),      are large (or the flakes are thick), they are more likely to
(Higgins et al. 2017)). The first term in the learning objec-      be sparse. This correlation seems to be well-captured in the
tive of VAE encourages the posterior pθ (z|x) to be like the       variations of dimensions 23, 26 and 34.
prior p(z) which is a standard normal distribution with diag-
onal covariance matrix. That is, this term encourages the la-      4.4   Structure-Property linkage
tent dimensions to be statistically independent (Higgins et al.    As shown in Figure 6, for cast iron microstructures, some
2017). Such representations are easier to interpret and can        of the latent dimensions seem highly correlated with quan-
be more useful in downstream tasks ((Ridgeway 2016) and            tities such as grain size, morphology, grain density and so
(Bengio, Courville, and Vincent 2013)). We perform exper-          on. It is known that these factors have a profound impact
iments to show that the texture-VAE model recovers physi-          on the mechanical properties of cast iron. For example, the
cally significant factors of variation.                            spherical grains prevent a passing crack from further propa-
   Starting with an image x, we obtain its latent representa-      gating, so lead to higher strength. Whereas the flakes deflect
tion z = Enc(x). Then we choose a dimension i of z and             the crack into a number of other directions, so lead to brit-
vary it in the range [−4, 4] by choosing 10 equally spaced         tleness. Consequently, the representation is expected to lend
values, while keeping all other dimensions unchanged. That         itself to a more accurate property prediction model. In the
is, z 0 [i] = j, j ∈ linspace(−4, 4, 10) and z 0 [k] = z[k] for    following, we describe some experiments that support this
all other dimensions k. By decoding these z 0 vectors, we ob-      claim.
serve the variations in the image space. Figure 6 shows im-            As stated earlier, the stress-strain curves of 6 microstruc-
ages of two examples obtained by varying dimensions 17,            tures corresponding to the smallest and largest cooling rates
23, 26 and 34. These dimensions were chosen for illustra-          were available, from which we obtained ultimate tensile
           Table 2: Property prediction accuracy                           Table 3: Prediction of Ys - Generalization

       Method               UTS               Ys                                Method                  MAPE
                      R2      MAPE      R2     MAPE                             TextureVAE + SVR         8.02
       LinReg        0.89       20     0.88      14                             VGG19 + SVR             18.22
       SVR-RBF       0.97       10     0.98       5

                                                                                      5    Conclusion
                                                                  We have presented a variational autoencoder model to learn
                                                                  microstructure representations. The objective function is ob-
                                                                  tained by replacing the reconstruction loss in the vanilla
                                                                  VAE with the style loss. We applied the model on a set of
                                                                  experimental cast iron microstructures. Through latent space
                                                                  traversals, we showed that the learned representation explic-
                                                                  itly encodes factors of variation that are primarily respon-
                                                                  sible for the mechanical properties (such as ultimate ten-
                                                                  sile strength and yield strength). Consequently, the repre-
                                                                  sentation is highly predictive of mechanical properties. We
  Figure 7: Prediction of Ys for an unseen microstructure         showed that the regression model built using these repre-
                                                                  sentations can reasonably predict the properties for totally
                                                                  unseen morphologies.
                                                                     Since the learned representation is predictive of mechani-
strength and yield strength. We trained a regression model
                                                                  cal properties and some of its dimensions can be physically
from the latent representations of patches of these mi-
                                                                  interpreted, we expect that it can be used for inverse infer-
crostructures to the property values. Note that the property
                                                                  ence as well - i.e. predicting the structure required to get
values correspond to original full-size microstructure im-
                                                                  desired properties. A probabilistic model such as Bayesian
ages, whereas our model’s input size is 128x128. We assume
                                                                  network that can represent the joint distribution between la-
that all 128x128 patches cropped from the same microstruc-
                                                                  tent dimensions and properties can be used to infer the most
ture image have the same property value. A validation set
                                                                  probable values of the latent dimensions given the proper-
containing 20% of the patches was kept aside for evaluation.
                                                                  ties. The obtained latent vector can then be decoded using
Table 2 shows the R2 value and mean absolute percentage
                                                                  the VAE model to give the required microstructure. This is
error in prediction of ultimate tensile strength (UTS) and
                                                                  a direction we are pursuing as further work. We believe the
yield strength (Ys) on the validation set. The table shows that
                                                                  present work is a step towards a general framework for learn-
we get reasonably good accuracy even with a simple linear
                                                                  ing interpretable microstructure representations.
regression model, revealing that the learned representation
is highly predictive of the properties. With a more expres-
sive model such as support vector regression (with a radial                               References
basis kernel) the accuracy goes significantly higher, further     Banko, L.; Lysogorskiy, Y.; Grochla, D.; Naujoks, D.;
strengthening the belief in the predictive power of the repre-    Drautz, R.; and Ludwig, A. 2020. Predicting structure
sentation.                                                        zone diagrams for thin film synthesis by generative ma-
   To test generalization, we trained the SVR model for yield     chine learning. Communications Materials 1(1): 15. ISSN
strength using only five microstructures and used it to pre-      2662-4443. doi:10.1038/s43246-020-0017-2. URL https:
dict the yield strength for the sixth microstructure. Note that   //doi.org/10.1038/s43246-020-0017-2.
this is different from the above experiment on the valida-        Bengio, Y.; Courville, A.; and Vincent, P. 2013. Represen-
tion set. Here, the regression model does not see any patches     tation Learning: A Review and New Perspectives. IEEE
(and the property values) from the excluded microstructure.       Trans. Pattern Anal. Mach. Intell. 35(8): 17981828. ISSN
The missing microstructure corresponds to the lowest cool-        0162-8828. doi:10.1109/TPAMI.2013.50. URL https://doi.
ing rate which results in the largest spherical grains. Figure    org/10.1109/TPAMI.2013.50.
7 shows the histogram of predicted values on all patches of
this microstructure. It can be seen that the mean prediction      Bostanabad, R.; Zhang, Y.; Li, X.; Kearney, T.; Brinson, L.;
is near 550M pa. The true value found from experiments is         Apley, D.; Liu, W.; and Chen, W. 2018. Computational mi-
598M pa, so the prediction is off by about 8% not deviating a     crostructure characterization and reconstruction: Review of
lot from the 5% error on the validation set. We performed the     the state-of-the-art techniques. Progress in Materials Sci-
same experiment using latent representations obtained from        ence 95: 1–41.
unmodified, pre-trained VGG19 network. Table 3 shows that         Cang, R.; Li, H.; Yao, H.; Jiao, Y.; and Ren, Y. 2018. Im-
the texture-VAE representations generalize much better as         proving direct physical properties prediction of heteroge-
compared to pre-trained VGG19. We think that the reason           neous materials from imaging data via convolutional neu-
behind better generalization with our representation is that it   ral network and a morphology-aware generative model.
encodes physically significant attributes.                        Computational Materials Science 150: 212 – 221. ISSN
0927-0256. doi:https://doi.org/10.1016/j.commatsci.2018.         Larsen, A. B. L.; Snderby, S. K.; Larochelle, H.; and
03.074. URL http://www.sciencedirect.com/science/article/        Winther, O. 2016. Autoencoding beyond pixels using a
pii/S0927025618302337.                                           learned similarity metric. volume 48 of Proceedings of
Chun, S.; Roy, S.; Nguyen, Y. T.; Choi, J. B.; Udayku-           Machine Learning Research, 1558–1566. New York, New
mar, H. S.; and Baek, S. S. 2020. Deep learning for syn-         York, USA: PMLR. URL http://proceedings.mlr.press/v48/
thetic microstructure generation in a materials-by-design        larsen16.html.
framework for heterogeneous energetic materials. Scien-          Li, Y.; Fang, C.; Yang, J.; Wang, Z.; Lu, X.; and Yang, M.
tific reports 10(1): 13307–13307. ISSN 2045-2322. doi:           2017. Diversified Texture Synthesis with Feed-Forward Net-
10.1038/s41598-020-70149-0. URL https://pubmed.ncbi.             works. In 2017 IEEE Conference on Computer Vision and
nlm.nih.gov/32764643. 32764643[pmid].                            Pattern Recognition (CVPR), 266–274.
DeCost, B. L.; Hecht, M. D.; Francis, T.; Webler, B. A.;         Liu, R.; Kumar, A.; Chen, Z.; Agrawal, A.; Sundararagha-
Picard, Y. N.; and Holm, E. A. 2017. UHCSDB: Ultra-              van, V.; and Choudhary, A. 2015. A predictive machine
High Carbon Steel Micrograph DataBase. Integrating Ma-           learning approach for microstructure optimization and ma-
terials and Manufacturing Innovation 6(2): 197–205. URL          terials design. In Nature Scientific Reports, volume 5.
https://doi.org/10.1007/s40192-017-0097-0.
                                                                 Odena, A.; Dumoulin, V.; and Olah, C. 2016. Deconvolution
Ding, K.; Ma, K.; Wang, S.; and Simoncelli, E. P. 2020.          and Checkerboard Artifacts. Distill URL http://distill.pub/
Image Quality Assessment: Unifying Structure and Texture         2016/deconv-checkerboard/.
Similarity. CoRR abs/2004.07728. URL https://arxiv.org/
abs/2004.07728.                                                  Ridgeway, K. 2016. A Survey of Inductive Biases for Facto-
                                                                 rial Representation-Learning. CoRR abs/1612.05299. URL
Dosovitskiy, A.; and Brox, T. 2016. Generating Images            http://arxiv.org/abs/1612.05299.
with Perceptual Similarity Metrics based on Deep Net-
works. In Lee, D.; Sugiyama, M.; Luxburg, U.; Guyon,             Simonyan, K.; and Zisserman, A. 2014. Very Deep Convolu-
I.; and Garnett, R., eds., Advances in Neural Information        tional Networks for Large-Scale Image Recognition. CoRR
Processing Systems, volume 29, 658–666. Curran Asso-             abs/1409.1556.
ciates, Inc. URL https://proceedings.neurips.cc/paper/2016/      Ulyanov, D.; Vedaldi, A.; and Lempitsky, V. S. 2017. Im-
file/371bce7dc83817b7893bcdeed13799b5-Paper.pdf.                 proved Texture Networks: Maximizing Quality and Diver-
Ehmann, J.; Pappas, T.; and Neuhoff, D. 2013. Structural         sity in Feed-Forward Stylization and Texture Synthesis. In
Texture Similarity Metrics for Image Analysis and Retrieval.     2017 IEEE Conference on Computer Vision and Pattern
IEEE transactions on image processing : a publication of the     Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26,
IEEE Signal Processing Society 22. doi:10.1109/TIP.2013.         2017, 4105–4113. IEEE Computer Society. doi:10.1109/
2251645.                                                         CVPR.2017.437. URL http://doi.ieeecomputersociety.org/
                                                                 10.1109/CVPR.2017.437.
Gatys, L.; Ecker, A. S.; and Bethge, M. 2015. Tex-
ture Synthesis Using Convolutional Neural Networks. In
Cortes, C.; Lawrence, N. D.; Lee, D. D.; Sugiyama,
M.; and Garnett, R., eds., Advances in Neural Infor-
mation Processing Systems 28, 262–270. Curran Asso-
ciates, Inc. URL http://papers.nips.cc/paper/5633-texture-
synthesis-using-convolutional-neural-networks.pdf.
Higgins, I.; Mattheyand, L.; Pal, A.; Burgess, C.; Glorot, X.;
Botvinick, M.; Mohamed, S.; and Lerchner, A. 2017. β-
VAE: Learning Basic Visual Concepts With a Constrained
Variational Framework. ICLR URL https://openreview.net/
pdf?id=Sy2fzU9gl.
Hsu, T.; Epting, W. K.; Kim, H.; Abernathy, H. W.; Hack-
ett, G. A.; Rollett, A. D.; Salvador, P. A.; and Holm, E. A.
2020. Microstructure Generation via Generative Adversar-
ial Network for Heterogeneous, Topologically Complex 3D
Materials. arXiv e-prints arXiv:2006.13886.
Johnson, J.; Alahi, A.; and Fei-Fei, L. 2016. Perceptual
losses for real-time style transfer and super-resolution. In
European Conference on Computer Vision.
Kalidindi, S. R. 2015. 1 - Materials, Data, and Informat-
ics. In Hierarchical Materials Informatics, 1 – 32. Boston:
Butterworth-Heinemann. ISBN 978-0-12-410394-8.
Kingma, D. P.; and Welling, M. 2013. Auto-Encoding Vari-
ational Bayes. ICLR abs/1312.6114.