=Paper=
{{Paper
|id=Vol-2964/article_59
|storemode=property
|title=TextureVAE: Learning Interpretable Representations of Material Microstructures Using Variational Autoencoders
|pdfUrl=https://ceur-ws.org/Vol-2964/article_59.pdf
|volume=Vol-2964
|authors=Avadhut Sardeshmukh,Sreedhar Reddy,Bp Gautham,Pushpak Bhattacharyya
|dblpUrl=https://dblp.org/rec/conf/aaaiss/SardeshmukhRPB21
}}
==TextureVAE: Learning Interpretable Representations of Material Microstructures Using Variational Autoencoders==
TextureVAE : Learning Interpretable Representations of Material Microstructures Using Variational Autoencoders Abstract We propose a variational autoencoder model based on style loss for learning representations of material microstructure images. We show using latent space traversals that the model captures important attributes of microstructures that are re- sponsible for mechanical properties of materials and is ca- (a) Cast iron (b) Ultra-high carbon steel pable of generating microstructures with particular attributes. We discuss how the latent vectors can be used to establish a Figure 1: Example Microstructures - The information of in- linkage between structure and properties and enable inverse terest varies with the material system under consideration. inference which is crucial for designing materials and prod- (Credit: (a) Tewary et al. [to be published], (b) (DeCost et al. ucts with target properties. 2017) 1 Introduction When a material is put through a manufacturing process, a random vector of length r has both its ends in the same it’s internal structure is modified, which in turn affects the phase. These are the most widely used correlation functions properties of the material. Materials scientists seek answers for formal mathematical characterization of microstructures. to questions such as what processing is required to achieve However, it has been shown that different microstructures the target properties and how do the structure and properties may, under some conditions, lead to similar 2-point corre- change with the process parameters. It is well known that lations (Cang et al. 2018). And beyond n = 2 (i.e., higher- these mappings are complex, highly non-linear and multiple order spatial correlations), n-point statistics quickly become processing paths can lead to the same property. These map- intractable. Hence, these methods are not easily extensible pings can be best described through the space of structures in general. (Kalidindi 2015). The most commonly available description Another common alternative is using physical descriptors. of the structure is in the form of microscopic images, which For example, consider the two microstructures in Figure 1. is known as the microstructure (because the length scale is The first one is a cast iron microstructure containing spher- roughly 10−6 m). Obtaining compact representations of mi- ical grains of graphite, and the ferrite phase in background. crostructure images 1 is therefore crucial for building robust Here, some physical descriptors of interest are the sizes of process-structure-property linkages. the spherical grains, their density and so on. Whereas the The microstructure contains a lot of information such as second one is an ultra-high carbon steel microstructure con- grain size distribution, volume fractions of different phases taining pearlite phase as an alternating layer with the ferrite and so on. Depending upon the material system under con- phase. Here the important physical descriptors are the ori- sideration, very different types of features and information entation of the lamellar pattern, the inter-lamellar spacing is relevant. Traditionally, materials scientists have used sta- and so on. One needs to employ specific image processing tistical methods such as n-point correlation functions and to extract physical descriptors from given microstructures. Gaussian random fields for obtaining representations of mi- Also, such a representation is specific to the material system crostructures. The n-point correlation functions capture the and is tightly tied to the expertise of a materials scientist in degree of spatial correlation among the locations and con- selecting the right descriptors. stituents in a probabilistic sense (Kalidindi 2015). For ex- Researchers are looking at the recent methods of repre- ample, given a microstructure containing two phases, the 2- sentation learning from the deep learning literature as an al- point statistics can be used to encode the probability that ternative to the above-mentioned microstructure representa- Copyright c 2021for this paper by its authors. Use permitted un- tion methods. A major challenge in leveraging these tech- der Creative Commons License Attribution 4.0 International (CC niques is the scarcity of data. Transfer learning can be used BY 4.0) to mitigate this up to some extent. However, a further chal- 1 For simplicity, henceforth referred to as just microstructure lenge is that it is even more difficult to get microstructures with associated property values, owing to the high cost and further work. Liu et al. (Liu et al. 2015) propose a design time required to do the testing. Deep generative models such method for inferring structures with target properties using as the variational autoencoders aim to learn good latent rep- Bayesian optimization around the GAN generator. The au- resentations in an unsupervised manner. In absence of a su- thors talk about the possibility of using the implicitly learned pervisory signal, these models try to learn latent representa- representation from the GAN discriminator for a structure- tions from which the original inputs can be reconstructed property model, but do not present any study on this. Proba- most accurately. Such a representation is expected to en- bly the closest to our work is Cang et al. (Cang et al. 2018), code all non-redundant information from the image. Fur- where a variational autoencoder model with style loss is pro- ther, these models are capable of synthesizing images that posed for generating microstructures of sandstone. The au- are realistic and statistically equivalent to the training im- thors show that the generated microstructures are more pre- ages. Synthesis is a common goal to support computational dictive of the properties (Young’s modulus, diffusivity and design because the cost and difficulty of experimental char- permeability) than those generated using Gaussian random acterization is often prohibitively high (Hsu et al. 2020). In- field method. They add the style loss to the vanilla VAE ob- spired by this, we propose a variational autoencoder archi- jective function, retaining the original reconstruction loss. tecture to learn low-dimensional microstructure representa- However, the vanilla VAE reconstruction loss is not suitable tions. We demonstrate that the learned latent representation for microstructure images (see section 3.2 for more details). indeed encodes important features, with a use-case in which So we completely replace the reconstruction loss with the such features are known in advance. The representation can style loss. We also show that physically significant factors be physically interpreted in that individual latent dimensions of variation are explicitly encoded in the learned representa- correspond to different features which are known to be im- tion. To the best of our knowledge, ours is the first work on a portant from the physics knowledge. Such a representation is variational autoencoder model for microstructure generation therefore expected to work well for modeling the structure- and interpretable representation learning. property linkages. Our key contribution is an interpretable microstructure 3 Methodology representation method that 3.1 Variational Autoencoders • captures physically significant factors of variation which Variational Autoencoders (Kingma and Welling 2013) are are primarily responsible for the mechanical properties of typically used to learn latent representations of input sam- the material ples in an unsupervised manner. The underlying graphical • can be used to generate different microstructures by vary- model is pθ (x, z) = pθ (z)pθ (x|z), where x is the observed ing these factors variable (input sample) and z is latent variable (the represen- tation). Given an input sample, the latent variables can be in- 2 Related work ferred from the posterior p(z|x). Computing this distribution is a hard problem due to the intractable partition function re- With the recent advances in machine learning, there is a re- quired in applying Bayes’ theorem. In variational inference, newed interest among materials scientists to leverage these an approximate posterior distribution qφ (z|x) from a known advances for material microstructure modeling. Bostanabad family is found by minimizing the KL divergence from the et al. (Bostanabad et al. 2018) provides a detailed review of true posterior. That is, find q ∗ such that the state-of-the-art in computational characterization of ma- terial microstructure. We discuss some of the more recent q ∗ = argminqφ DKL (qφ (z|x)kpθ (z|x)) works on application of deep learning for this task. However, computing the KL divergence also involves the In some recent works, generative adversarial nets (GANs) same intractable integrals as in posterior computation. So and variational autoencoders have been used for material instead of minimizing it, another tractable quantity called microstructure generation. Often the focus is on generation the Evidence Lower Bound (ELBO) derived from the above rather than representation learning. For example, (Banko equation is maximized and it is shown that maximizing et al. 2020) use a conditional GAN to generate microstruc- ELBO is equivalent to minimizing the KL divergence. The tures of thin films conditioned on process parameters and ELBO is defined as: chemical composition. While Hsu et al. (Hsu et al. 2020) use GANs to generate small patches of 3D microstructure of L = −DKL (qφ (z|x)kpθ (z)) + Eqφ (z|x) [log pθ (x|z)] (1) solid oxide fuel cell anodes. They show that the properties The above loss function can be roughly understood as fol- computed by numerical simulations on the generated mi- lows : The second term is the expected log-likelihood of crostructures closely match the experimental observations. getting back the same x starting with the inferred z from the Chun et al. (Chun et al. 2020) use a patch-based GAN to approximate posterior qφ (z|x). It is often called the recon- generate microstructures of heterogeneous energetic materi- struction loss. The first term is a regularizer that penalizes als (propellants explosives and pyrotechnics). The input to posteriors very different from the prior. their model consists of a pair of vectors for each grid loca- The prior pθ (z) and the posterior qφ (z|x) are generally tion (patch). They show that during generation, the two vec- assumed to be Gaussian. The distribution qφ is parameter- tors can be used to control overall morphology. However the ized by an inference network and resembles an encoder. It intuitive meanings of individual dimensions of these vec- outputs the µ and σ of the posterior for a given input sam- tors are not clear and the authors point to this as possible ple (i.e. qφ (z|xi ) = N (z; µxi , σxi )). The distribution pθ is parameterized by a generator network and resembles a de- a major improvement over state-of-the-art in texture syn- coder. It outputs a sample from the distribution pθ given a thesis, it required an optimization step for each genera- latent vector z. tion. This method has been extended to feed-forward ap- proaches that learn a separate generator network to mini- 3.2 Texture-VAE mize the style loss, for example (Johnson, Alahi, and Fei- However, the reconstruction loss in the original VAE objec- Fei 2016), (Ulyanov, Vedaldi, and Lempitsky 2017) and tive is not suitable to model the mismatch between a mi- (Li et al. 2017). The generator transforms a random in- crostructure image and its reconstruction. We discuss the put vector (typically a standard Gaussian) into texture im- reasons as follows. In the decoder network, the reconstruc- ages. However, these methods are not well suited for repre- tion x̂ is a deterministic function of z. Now consider the re- sentation learning because they focus on generation. Most construction loss : Eqφ (z|x) [log pθ (x|z)]. With continuous of these methods require learning one network per texture valued outputs such as images, the generator distribution is (or per style). The difficulty in using the same network for generally assumed to be Gaussian whose mean µ = f (z; θ) multiple styles seems to stem from the fact that the Gram is computed by the generator network. Thus, µ = x̂. Since matrices of different styles have very different scales and the reconstruction loss is an expectation of log of a Gaus- the generator adapts itself to a particular style. There have sian density, it is equivalent to (x − µ)2 (i.e. (x − x̂)2 ) up to been some works on methods capable of learning multiple some constants. Hence the reconstruction loss is equivalent styles/textures with a single model and they seem to focus to pixel-wise mean squared error. It has been argued in many on normalizing the styles and forcing a correlation between works that pixel-wise comparison is not capable of captur- the random input and generated image. Variational autoen- ing perceptual image similarity (see for example (Ding et al. coders naturally address this issue with the encoder learning 2020), (Dosovitskiy and Brox 2016) or (Larsen et al. 2016)). different representations for different styles. Lastly, while This is especially true in case of microstructures, which are the concatenation of Gram matrices characterizes a texture a type of texture images (they contain randomly repeating well, it may not be of much use as a representation vector patterns such as spheres, lines and so on). Imagine a stripes by itself for other downstream tasks because it is very high pattern and another one, shifted one stripe right. A human dimensional (often, more dimensions than the image itself). instantly understands that they are essentially “the same tex- Combining these ideas, we propose using the style loss to ture”, but the pixel-wise difference could be huge. To sum- train a variational autoencoder. In particular we replace the marize, the reconstruction loss term in equation 1 is not suit- reconstruction loss (i.e. the second term) in equation 1 with able for texture images since it leads to a pixel-by-pixel com- the style loss (given in equation 2). We keep the first term parison between input and reconstructed image. This moti- as it is - this regularizes the latent representation by forc- vates replacing the reconstruction loss with a better suited ing all posteriors to be not far from the prior. With a stan- measure for textures. dard Gaussian prior with diagonal covariance matrix, this Since textures are different from natural images that gen- term encourages a representation with statistically indepen- erally contain objects, special considerations are needed for dent dimensions, which is expected to be interpretable. This representing textures. In texture synthesis literature, Gatys is discussed in more detail in section 4.3. We call our model et al. (Gatys, Ecker, and Bethge 2015) proposed to use the the texture-VAE. We show that a single VAE model trained feature correlations computed from different layers of a pre- with style loss can be used for multiple textures. trained network (e.g. VGG19) to represent textures. The fea- ture correlations at layer l are encoded by the Gram ma- trix G(l) , whose elements are inner products between fea- 3.3 Model architecture ture maps at that layer. If layer l has Cl feature maps of size Wl × Hl then: The architecture of our VAE model is shown in Figure 2. Glij = Σk Fikl l Fjk We use the pre-trained VGG19 network (Simonyan and Zis- serman 2014) for computing the style loss and as the en- Where, Ml = Wl ∗ Hl and F l is the Cl × Ml matrix with the coder. The last two fully connected layers of the encoder that flattened feature maps as rows. A texture can then be repre- compute µ and σ of the posterior are trained from scratch. sented by the concatenation of all Gram matrices. Given an For the remaining layers, the pre-trained weights are used input texture image, the authors propose a method to gen- without fine-tuning. The decoder contains four blocks of de- erate similar textures by starting with a random white noise convolution followed by nearest-neighbor upsampling and image and minimizing the squared difference between the LeakyReLU nonlinearity with slope 0.3. It has been previ- representations. Note that the optimization is with respect to ously observed in literature that for generation, explicit up- the image pixels; the weights of the pre-trained network are sampling works better than using fractional strided convo- not changed. They call the squared difference between Gram lutions (Odena, Dumoulin, and Olah 2016). We use a filter matrix concatenations as the “style loss”: size of 3 × 3 throughout. For computing the style loss, we use the pooling layer L X 1 X at each scale, i.e. pool 1 to pool 4 and conv 1 1. Gatys et Lstyle (x, x̂) = wl 2 2 (Gl − Ĝlij )2 (2) 4Cl Ml i,j ij al. (Gatys, Ecker, and Bethge 2015) recommend using the l=0 convolutional layers instead of pooling. But we found that While this approach of “generation-by-optimization” was in our case, pooling layers worked better. (a) (b) (c) (d) (e) (f) Figure 3: Example patch from each microstructure. (a) small dense spheres, (b) small sparse spheres, (c) large spheres, (d) intermediate (e) fine flakes and (f) thick flakes Table 1: Average similarity between reconstructed and orig- inal patches Score Latent Dims DISTS STSIM-2 64 0.7031 0.7592 Figure 2: The variational autoencoder model with style loss. 32 0.7257 0.7640 Lstyle is the sum of squared differences between Gram ma- 16 0.6995 0.7609 trices at all layers and denotes the inner product of feature 8 0.6962 0.7591 maps (F l )T F l The original microstructures were 2048x1532. We use 4 Experiments patches of 128x128 cropped by sliding a window with stride 50 for our training. 4.1 Dataset We use microstructures of cast iron to demonstrate the ca- 4.2 Evaluation pabilities of our texture-VAE model. These microstructures While there are many metrics of perceptual image similar- were produced in a separate study by Ujjal Tewary et al. (to ity, very few of them are focused on texture images. Two be published) of microstructure and mechanical properties recently proposed metrics of texture similarity that seem to of cast iron produced using sand casting. In the process of be best suited for our evaluation are - i) Deep Image Struc- sand casting, molten iron ore mixed with C, Si, Mg etc. is ture and Texture Similarity - DISTS (Ding et al. 2020) and poured into molds of desired size and kept in sand for cool- ii) Structural Texture Similarity - STSIM (Ehmann, Pappas, ing. In case of cylindrical molds, the sample cools from the and Neuhoff 2013). The DISTS score consists of two terms - surface to its core, so the radius governs the cooling rate. first one compares the means of features maps (from a vari- In the study, cylindrical castings of various radii (resulting ant of pre-trained VGG) and the second one computes cross in different cooling rates) were made using sand casting. covariance between them. The two terms are combined us- Microstructures of these cylindrical castings were then cap- ing weights tuned to match human judgments and be invari- tured using a scanning electron microscope. The original ant to re-sampled patches from the same texture. The STSIM study consisted of many experiments but we describe here metric is based on a similar modification of Structural Sim- only those that correspond to the microstructures we used. ilarity Metric (SSIM) to completely avoid pixel-by-pixel We use the microstructures of 12 samples resulting from comparison, but is computed in the Fourier spectrum. We combination of four cooling rates corresponding to cylin- used the authors’ implementation of DISTS2 . STSIM has ders of radius 12, 24, 36 and 48mm and three compositions several configurations and we compute the STSIM-2 metric (mainly varying Magnesium - 0, 0.025 and 0.045% weight). using a publicly available implementation3 . Table 1 shows The images were captured at 100µm length scale, without the average similarity over 100 instances between original any etching. The microstructures mainly contain ferrite and patches and reconstructions from texture-VAE models with graphite phases. The cooling rate and initial composition af- number of latent vector dimensions 8, 16, 32 and 64. The fect the grain size, density and morphology (i.e. appearance similarity scores are on a scale of 0 to 1, 1 being the high- of the graphite - spherical, flaky or both) of the resulting mi- est. The similarity seems to slightly increase with increas- crostructure. Figure 3 shows small 128x128 patches from a ing number of latent dimensions, but beyond 64 it didn’t in- few microstructures. We have chosen these so that the vari- crease, so we stopped there. ations in grain size (small to large), density (low to high) One way to qualitatively evaluate a VAE model is to look and morphology (spherical, flakes and intermediate) can be at the reconstructed and newly generated samples. Figure 4 clearly seen. shows some example reconstructions while Figure 5 shows Out of the 12, 6 samples corresponding to the lowest and some randomly generated samples from texture-VAE model highest cooling rates were also subjected to uni-axial com- with 64 latent dimensions. Although the latent-32 model pression to obtain their stress-strain behavior. We used all 12 had slightly higher similarity scores for reconstructions than microstructures to train the texture-VAE model while the 6 2 with property values were used for property prediction task https://github.com/dingkeyan93/DISTS 3 described in section 4.4. https://github.com/andreydung/Steerable-filter z[17] z[23] (a) Small spheres (b) Large spheres (c) Intermediate z[26] z[34] (d) Intermediate (e) Fine flakes (f) Thick flakes (a) Starting with large spheres Figure 4: Example reconstructions z[17] z[23] z[26] (a) (b) (c) (d) (e) z[34] Figure 5: Randomly generated samples (b) Starting with fine flakes Figure 6: Effect of varying latent dimensions, starting with latent-64, the latter recovered finer details better, which are different morphologies. Each row corresponds to one latent important from the domain perspective (for example the dimension. The leftmost image is the original patch and the ferrite grain boundaries, explained in the next paragraph). remaining are variations obtained by varying that particular Hence we did all further experimentation with the latent-64 latent dimension model. The left half of each image in Figure 4 is a patch from the original microstructure - x, while the right half is the reconstructed image - x̂ = Dec(Enc(x)). We have tion since they seem to produce physically significant vari- shown some representative examples from each microstruc- ations that are visually discernible as well. Figure 6a shows ture. It can be seen that even the minute structural details the variations starting with a large-spheres microstructure, such as the ferrite grain boundaries (the thin lines in the gray whereas Figure 6b shows the variations starting with a fine- portion) which are faintly visible only in the case of large flakes microstructure. From the figure, dimension 17 seems spherical grains in Figure 4b are reconstructed quite well. to correspond to morphology, with lower values indicating These results show that the texture-VAE model is capable flaky and higher values indicating spherical structures. Di- of reconstructing input samples quite well across different mensions 23 and 26 seem to correspond to the density and textures. The randomly generated samples also span differ- size respectively of spherical grains, with their values in- ent textures and look structurally similar to the original ones. creasing as we go from left to right. Whereas dimension 34 seems to correspond to the density of flakes. From the 4.3 Interpretability physics of Cast iron microstructures, it is known that grain Variational autoencoders have been shown to recover factors size and density are correlated - when the spherical grains of variation in the training data ((Kingma and Welling 2013), are large (or the flakes are thick), they are more likely to (Higgins et al. 2017)). The first term in the learning objec- be sparse. This correlation seems to be well-captured in the tive of VAE encourages the posterior pθ (z|x) to be like the variations of dimensions 23, 26 and 34. prior p(z) which is a standard normal distribution with diag- onal covariance matrix. That is, this term encourages the la- 4.4 Structure-Property linkage tent dimensions to be statistically independent (Higgins et al. As shown in Figure 6, for cast iron microstructures, some 2017). Such representations are easier to interpret and can of the latent dimensions seem highly correlated with quan- be more useful in downstream tasks ((Ridgeway 2016) and tities such as grain size, morphology, grain density and so (Bengio, Courville, and Vincent 2013)). We perform exper- on. It is known that these factors have a profound impact iments to show that the texture-VAE model recovers physi- on the mechanical properties of cast iron. For example, the cally significant factors of variation. spherical grains prevent a passing crack from further propa- Starting with an image x, we obtain its latent representa- gating, so lead to higher strength. Whereas the flakes deflect tion z = Enc(x). Then we choose a dimension i of z and the crack into a number of other directions, so lead to brit- vary it in the range [−4, 4] by choosing 10 equally spaced tleness. Consequently, the representation is expected to lend values, while keeping all other dimensions unchanged. That itself to a more accurate property prediction model. In the is, z 0 [i] = j, j ∈ linspace(−4, 4, 10) and z 0 [k] = z[k] for following, we describe some experiments that support this all other dimensions k. By decoding these z 0 vectors, we ob- claim. serve the variations in the image space. Figure 6 shows im- As stated earlier, the stress-strain curves of 6 microstruc- ages of two examples obtained by varying dimensions 17, tures corresponding to the smallest and largest cooling rates 23, 26 and 34. These dimensions were chosen for illustra- were available, from which we obtained ultimate tensile Table 2: Property prediction accuracy Table 3: Prediction of Ys - Generalization Method UTS Ys Method MAPE R2 MAPE R2 MAPE TextureVAE + SVR 8.02 LinReg 0.89 20 0.88 14 VGG19 + SVR 18.22 SVR-RBF 0.97 10 0.98 5 5 Conclusion We have presented a variational autoencoder model to learn microstructure representations. The objective function is ob- tained by replacing the reconstruction loss in the vanilla VAE with the style loss. We applied the model on a set of experimental cast iron microstructures. Through latent space traversals, we showed that the learned representation explic- itly encodes factors of variation that are primarily respon- sible for the mechanical properties (such as ultimate ten- sile strength and yield strength). Consequently, the repre- sentation is highly predictive of mechanical properties. We Figure 7: Prediction of Ys for an unseen microstructure showed that the regression model built using these repre- sentations can reasonably predict the properties for totally unseen morphologies. Since the learned representation is predictive of mechani- strength and yield strength. We trained a regression model cal properties and some of its dimensions can be physically from the latent representations of patches of these mi- interpreted, we expect that it can be used for inverse infer- crostructures to the property values. Note that the property ence as well - i.e. predicting the structure required to get values correspond to original full-size microstructure im- desired properties. A probabilistic model such as Bayesian ages, whereas our model’s input size is 128x128. We assume network that can represent the joint distribution between la- that all 128x128 patches cropped from the same microstruc- tent dimensions and properties can be used to infer the most ture image have the same property value. A validation set probable values of the latent dimensions given the proper- containing 20% of the patches was kept aside for evaluation. ties. The obtained latent vector can then be decoded using Table 2 shows the R2 value and mean absolute percentage the VAE model to give the required microstructure. This is error in prediction of ultimate tensile strength (UTS) and a direction we are pursuing as further work. We believe the yield strength (Ys) on the validation set. The table shows that present work is a step towards a general framework for learn- we get reasonably good accuracy even with a simple linear ing interpretable microstructure representations. regression model, revealing that the learned representation is highly predictive of the properties. With a more expres- sive model such as support vector regression (with a radial References basis kernel) the accuracy goes significantly higher, further Banko, L.; Lysogorskiy, Y.; Grochla, D.; Naujoks, D.; strengthening the belief in the predictive power of the repre- Drautz, R.; and Ludwig, A. 2020. Predicting structure sentation. zone diagrams for thin film synthesis by generative ma- To test generalization, we trained the SVR model for yield chine learning. Communications Materials 1(1): 15. ISSN strength using only five microstructures and used it to pre- 2662-4443. doi:10.1038/s43246-020-0017-2. URL https: dict the yield strength for the sixth microstructure. Note that //doi.org/10.1038/s43246-020-0017-2. this is different from the above experiment on the valida- Bengio, Y.; Courville, A.; and Vincent, P. 2013. Represen- tion set. Here, the regression model does not see any patches tation Learning: A Review and New Perspectives. IEEE (and the property values) from the excluded microstructure. Trans. Pattern Anal. Mach. Intell. 35(8): 17981828. ISSN The missing microstructure corresponds to the lowest cool- 0162-8828. doi:10.1109/TPAMI.2013.50. URL https://doi. ing rate which results in the largest spherical grains. Figure org/10.1109/TPAMI.2013.50. 7 shows the histogram of predicted values on all patches of this microstructure. It can be seen that the mean prediction Bostanabad, R.; Zhang, Y.; Li, X.; Kearney, T.; Brinson, L.; is near 550M pa. The true value found from experiments is Apley, D.; Liu, W.; and Chen, W. 2018. Computational mi- 598M pa, so the prediction is off by about 8% not deviating a crostructure characterization and reconstruction: Review of lot from the 5% error on the validation set. We performed the the state-of-the-art techniques. Progress in Materials Sci- same experiment using latent representations obtained from ence 95: 1–41. unmodified, pre-trained VGG19 network. Table 3 shows that Cang, R.; Li, H.; Yao, H.; Jiao, Y.; and Ren, Y. 2018. Im- the texture-VAE representations generalize much better as proving direct physical properties prediction of heteroge- compared to pre-trained VGG19. We think that the reason neous materials from imaging data via convolutional neu- behind better generalization with our representation is that it ral network and a morphology-aware generative model. encodes physically significant attributes. Computational Materials Science 150: 212 – 221. ISSN 0927-0256. doi:https://doi.org/10.1016/j.commatsci.2018. Larsen, A. B. L.; Snderby, S. K.; Larochelle, H.; and 03.074. URL http://www.sciencedirect.com/science/article/ Winther, O. 2016. Autoencoding beyond pixels using a pii/S0927025618302337. learned similarity metric. volume 48 of Proceedings of Chun, S.; Roy, S.; Nguyen, Y. T.; Choi, J. B.; Udayku- Machine Learning Research, 1558–1566. New York, New mar, H. S.; and Baek, S. S. 2020. Deep learning for syn- York, USA: PMLR. URL http://proceedings.mlr.press/v48/ thetic microstructure generation in a materials-by-design larsen16.html. framework for heterogeneous energetic materials. Scien- Li, Y.; Fang, C.; Yang, J.; Wang, Z.; Lu, X.; and Yang, M. tific reports 10(1): 13307–13307. ISSN 2045-2322. doi: 2017. Diversified Texture Synthesis with Feed-Forward Net- 10.1038/s41598-020-70149-0. URL https://pubmed.ncbi. works. In 2017 IEEE Conference on Computer Vision and nlm.nih.gov/32764643. 32764643[pmid]. Pattern Recognition (CVPR), 266–274. DeCost, B. L.; Hecht, M. D.; Francis, T.; Webler, B. A.; Liu, R.; Kumar, A.; Chen, Z.; Agrawal, A.; Sundararagha- Picard, Y. N.; and Holm, E. A. 2017. UHCSDB: Ultra- van, V.; and Choudhary, A. 2015. A predictive machine High Carbon Steel Micrograph DataBase. Integrating Ma- learning approach for microstructure optimization and ma- terials and Manufacturing Innovation 6(2): 197–205. URL terials design. In Nature Scientific Reports, volume 5. https://doi.org/10.1007/s40192-017-0097-0. Odena, A.; Dumoulin, V.; and Olah, C. 2016. Deconvolution Ding, K.; Ma, K.; Wang, S.; and Simoncelli, E. P. 2020. and Checkerboard Artifacts. Distill URL http://distill.pub/ Image Quality Assessment: Unifying Structure and Texture 2016/deconv-checkerboard/. Similarity. CoRR abs/2004.07728. URL https://arxiv.org/ abs/2004.07728. Ridgeway, K. 2016. A Survey of Inductive Biases for Facto- rial Representation-Learning. CoRR abs/1612.05299. URL Dosovitskiy, A.; and Brox, T. 2016. Generating Images http://arxiv.org/abs/1612.05299. with Perceptual Similarity Metrics based on Deep Net- works. In Lee, D.; Sugiyama, M.; Luxburg, U.; Guyon, Simonyan, K.; and Zisserman, A. 2014. Very Deep Convolu- I.; and Garnett, R., eds., Advances in Neural Information tional Networks for Large-Scale Image Recognition. CoRR Processing Systems, volume 29, 658–666. Curran Asso- abs/1409.1556. ciates, Inc. URL https://proceedings.neurips.cc/paper/2016/ Ulyanov, D.; Vedaldi, A.; and Lempitsky, V. S. 2017. Im- file/371bce7dc83817b7893bcdeed13799b5-Paper.pdf. proved Texture Networks: Maximizing Quality and Diver- Ehmann, J.; Pappas, T.; and Neuhoff, D. 2013. Structural sity in Feed-Forward Stylization and Texture Synthesis. In Texture Similarity Metrics for Image Analysis and Retrieval. 2017 IEEE Conference on Computer Vision and Pattern IEEE transactions on image processing : a publication of the Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, IEEE Signal Processing Society 22. doi:10.1109/TIP.2013. 2017, 4105–4113. IEEE Computer Society. doi:10.1109/ 2251645. CVPR.2017.437. URL http://doi.ieeecomputersociety.org/ 10.1109/CVPR.2017.437. Gatys, L.; Ecker, A. S.; and Bethge, M. 2015. Tex- ture Synthesis Using Convolutional Neural Networks. In Cortes, C.; Lawrence, N. D.; Lee, D. D.; Sugiyama, M.; and Garnett, R., eds., Advances in Neural Infor- mation Processing Systems 28, 262–270. Curran Asso- ciates, Inc. URL http://papers.nips.cc/paper/5633-texture- synthesis-using-convolutional-neural-networks.pdf. Higgins, I.; Mattheyand, L.; Pal, A.; Burgess, C.; Glorot, X.; Botvinick, M.; Mohamed, S.; and Lerchner, A. 2017. β- VAE: Learning Basic Visual Concepts With a Constrained Variational Framework. ICLR URL https://openreview.net/ pdf?id=Sy2fzU9gl. Hsu, T.; Epting, W. K.; Kim, H.; Abernathy, H. W.; Hack- ett, G. A.; Rollett, A. D.; Salvador, P. A.; and Holm, E. A. 2020. Microstructure Generation via Generative Adversar- ial Network for Heterogeneous, Topologically Complex 3D Materials. arXiv e-prints arXiv:2006.13886. Johnson, J.; Alahi, A.; and Fei-Fei, L. 2016. Perceptual losses for real-time style transfer and super-resolution. In European Conference on Computer Vision. Kalidindi, S. R. 2015. 1 - Materials, Data, and Informat- ics. In Hierarchical Materials Informatics, 1 – 32. Boston: Butterworth-Heinemann. ISBN 978-0-12-410394-8. Kingma, D. P.; and Welling, M. 2013. Auto-Encoding Vari- ational Bayes. ICLR abs/1312.6114.