Data Driven & Physics Constrained Perturbations For Turbulence Model
                              Uncertainty Estimation
                          Jan Felix Heyse, Aashwin Ananda Mishra, Gianluca Iaccarino
                      Mechanical Engineering Department, Stanford University, Stanford, CA-94305, USA


                            Abstract                                  pragmatic recourse for complex engineering flows, with a
  Turbulence models represent the workhorse for academic and
                                                                      vast majority of simulations, in both academia and indus-
  industrial studies involving real life manifestations of fluid      try, resorting to this avenue. Despite their widespread use,
  turbulence. However, due to the simplifications inherent in         RANS-based models suffer from an inherent structural in-
  their formulation, such turbulence models have a high degree        ability to replicate fundamental turbulence processes and
  of epistemic uncertainty associated with their predictions. Es-     specific flow phenomena, as they introduce a high degree
  timating this model form uncertainty is a critical and long         of epistemic uncertainty into the simulations arising due to
  standing problem in turbulence modeling and engineering de-         the model form(Craft, Launder, and Suga 1996; Schobeiri
  sign. To this end, the direct application of machine learning       and Abdelfattah 2013).
  to estimate turbulence model uncertainties ignores physics
                                                                         In this light, uncertainty quantification for RANS-based
  based domain knowledge and may even lead to unphysical
  results. In this light, we outline a framework that utilizes        closures attempts to assess the trustworthiness of model pre-
  data driven algorithms in conjunction with physics based con-       dictions of quantities of interest and is thus of consider-
  straints to generate reliable uncertainty estimates for turbu-      able utility in establishing RANS models as reliable tools
  lence models while ensuring that the solutions are physically       for engineering applications. To this end, investigators in
  permissible. The trained machine learning model, utilizing          the recent past have utilized data driven machine learning
  the random forest algorithm, is embedded in a Computational         approaches to engender interval estimates of uncertainty
  Fluid Dynamics solver and applied to complex problems to            on turbulence model predictions. Large corpora of avail-
  test and illustrate its efficacy. This library is to be released    able data from experiments and higher fidelity simulations
  as a computational software tool that enables the inclusion of      present an opportunity to enhance the predictive capabilities
  physics based constraints in applications of machine learning
                                                                      of RANS simulations. Traditionally, data has been used in
  in turbulence modeling.
                                                                      the context of turbulence modeling only for model calibra-
                                                                      tion and to define model corrections. Almost all turbulence
                        Introduction                                  models involve some empirical constants which are tuned
Fluid turbulence is a central problem across a variety of             to optimize the RANS predictions with respect to specific
disciplines in science and engineering, including Mechani-            calibration cases (Hanjalić and Launder 1972). Over the last
cal, Aerospace and Civil Engineering; Biomedical, Oceano-             decade, there has been an increasing attempt to utilize data
graphic, Meteorological and Astrophysical Sciences, be-               driven approaches to quantify the epistemic uncertainties in
sides others. The ability to reliably predict the evolution           RANS models.
of turbulent flows would lead to seminal advances across                 As illustrative instances, Wang and Dow (2010) studied
these fields. However, in spite of over a century of fo-              the structural uncertainties of the k − ω turbulence model by
cused research, no analytical theories to predict the evolu-          modeling the eddy viscosity discrepancy (i.e. the difference
tion of turbulence have been developed. With the present              between reference high-fidelity data and RANS predictions)
state of computational resources, a purely numerical reso-            as a random field. Their approach is based on Monte Carlo
lution of turbulent time and length scales encountered in en-         sampling, but given its slow convergence, a considerable
gineering problems is not viable in industrial design prac-           number of simulations are required in order to obtain mean-
tice. Consequently, almost all investigations have to resort          ingful uncertainty estimates. Wu, Xiao, and Paterson (2018)
to some degree of modeling. Turbulence models are consti-             used data driven algorithms to predict Reynolds stress dis-
tutive relations attempting to relate quantities of interest to       crepancies. The target of the machine learning model was a
flow parameters using assumptions and simplifications de-             post-hoc, local correction term for the RANS model’s pre-
rived from physical intuition and observations. Reynolds-             dictions. Duraisamy, Iaccarino, and Xiao (2019) provide a
averaged Navier-Stokes (RANS)-based models represent the              more comprehensive review of how data has been used to
Copyright © 2021 for this paper by its authors. Use permitted under   enhance turbulent flow simulations.
Creative Commons License Attribution 4.0 International (CC BY            However, such direct application of machine learning
4.0)                                                                  models to problems in physical sciences, such as fluid flow
and turbulence modeling, may not completely account for
the domain knowledge and more importantly, all the essen-
tial physics based constraints required. As an illustration,
the Reynolds averaging carried out in turbulence modeling
introduces a term in the momentum equations that requires
further modeling assumptions or simplifications, i.e. it is un-
closed. This term is referred to as the Reynolds stress tensor,
Rij = hui uj i, where ui are components of the fluctuating
velocity field after filtering. This is the key Quantity of In-
terest in turbulence modeling. However, there are essential
physics based constraints that any prediction of the Reynolds
stresses must follow. These are referred to as realizability
constraints. Schumann (1977) was the first to articulate the
realizability constraint in the context of turbulence closures,    Figure 1: Asymmetric plane diffuser setup. Inflow from the
requiring models to yield a Reynolds stress tensor that en-        left, outflow to the right.
                         2
sure that Rαα ≥ 0, Rαβ       ≥ Rαα Rββ and det(R) ≥ 0. Un-
less these constraints are explicitly adhered to, model pre-
dictions are unphysical. Furthermore, using machine learn-
ing approaches without physics based constraining, leads
to issues when such data driven models are integrated in
Computational Fluid Dynamics (CFD) software. Unrealiz-
able models can lead to problems in numerical convergence
and even numerical instability. Straightforward application
of machine learning models to problems in turbulence mod-
eling has led to unrealizable predictions and convergence is-
sues when such data driven models are integrated in CFD
software.
   In this investigation, we outline a methodology that intro-
duces physics constrained perturbations to estimate struc-         Figure 2: Diffuser simulation. Inflow (green) and mesh
tural uncertainty in turbulence models. Thence, we utilize         (blue) sensitivities, baseline calculation (black). Profiles of
machine learning algorithms to infer these perturbations           streamwise velocity at different x locations with experimen-
from labeled data. These two steps together ensure that this       tal data (red).
framework is both physics constrained and data driven. Fi-
nally, we integrate this library into CFD software suites and
carry out tests for robustness and reliability.
   After an overview of the problem in Section I, we outline       the bottom wall is opening up at a 10◦ angle. The corners at
the test problem in Section II. Thence, we outline the physics     the beginning and the end of that slope are rounded with a
constrained perturbation framework which is applied with-          radius of 9.7H.
out and with inference from data (data free and data driven).         The inflow is fully turbulent and has the Reynolds number
In the data free results, we utilize the maximum physically        20, 000, based on the centerline velocity and inflow chan-
permissible perturbations. In the data driven framework, we        nel height H. Interesting features of the resulting flow are
train a random forest regressor to predict the perturbations       flow separation, flow reattachment, and development of a
using data from other flows and integrate this trained model       new boundary layer. RANS simulations were carried out in
in the CFD software suite. We conclude with a summary of           OpenFOAM using the k −  turbulence model. Fully turbu-
this work and directions for future research.                      lent channel flow was used as inflow condition at x/H =
                                                                   −10. The outlet was at x/H = 60. The baseline calculation
                                                                   had a structured mesh with 9,472 cells, 148 in the x and 64
    Baseline Simulation: Turbulent Diffuser                        in the y direction.
The test case in this work is the turbulent separated flow in         Both a mesh convergence study and an inflow sensitiv-
a diffuser. Diffusers are used to decelerate the flow and in-      ity study were performed. For mesh convergence, each two
crease the static pressure of the fluid. The operating principle   coarser and two finer grids were used, with the number of
is simply a change in cross-sectional area, but space con-         grid cells changing by a factor of 2 between levels. For in-
straints and reduction of losses often lead to configurations      flow sensitivity, the inlet velocity profile was distorted at
that are prone to flow separation. Prediction of the turbulent     constant flow rate to vary the centerline velocity between
flow in a diffuser represents the challenge in this work.          90% and 110% of its nominal value. The results can be
   The turbulent flow in planar asymmetric diffuser first de-      seen in figure 2, where the streamwise velocity component
scribed by Obi, Aoki, and Masuda (1993) is considered. Fig-        is plotted at different x locations across the channel height.
ure 1 shows the setup: A channel is expanded from inflow           All individual mesh and inflow solutions are plotted, giving
width H to outflow width 4.7H. In the expansion section,           impression of the respective sensitivities, as well as the base-
line solution and experimental data from Buice and Eaton
(2000). Very limited sensitivity to boundary conditions and
numerical errors is observed in and after the expanding sec-
tion, providing confidence in the computations. Yet, while
the flow remains attached at all times in the RANS simula-
tions, the experimental data reveals the existence of a large
flow separation that has not been captured by the simulation.
The simulations are overpredicting the streamwise velocity
in the lower half of the channel and underpredicting it in the
upper half.

      Data free Uncertainty Quantification
Herein, we outline the physics constrained perturbation
framework that ensures Reynolds stress realizability. Given         Figure 3: Barycentric domain and eigenvalue perturbation.
an initially realizable Reynolds stress tensor, this framework
ensures that the perturbed Reynolds stresses remain posi-
tive semi-definite. The Reynolds stress tensor can be decom-
posed into the anisotropic and deviatoric components as
                                      δij
                    Rij = 2k(bij +        ).                 (1)
                                       3
Here, k(= R2ii ) is the turbulent kinetic energy and bij (=
Rij      δij
 2k − 3 ) is the Reynolds stress anisotropy tensor. The
Reynolds stress anisotropy tensor can be expressed as
bin vnl = vin Λnl , where vnl is the matrix of orthonor-
mal eigenvectors and Λnl is the traceless diagonal ma-
trix of eigenvalues λk . Multiplication by vjl yields bij =
vin Λnl vjl . This is substituted into Equation (1) to yield        Figure 4: Data free, uniform eigenvalue perturbation. Pro-
                                                                    files of streamwise velocity at different x locations.
                                          δij
                Rij = 2k(vin Λnl vjl +        ).            (2)
                                           3
The tensors v and Λ are ordered such that λ1 ≥ λ2 ≥ λ3 .            the barycentric map, x∗ representing the perturbed position,
In this representation, the shape, the orientation and the am-      xt representing the state perturbed toward and ∆B is the
plitude of the Reynolds stress ellipsoid are directly repre-        magnitude of the perturbation. In this context, λ∗l = B −1 x∗
sented by the turbulence anisotropy eigenvalues λl , eigen-         can be simplified to λ∗l = (1 − ∆B )λl + ∆B B −1 xt . Here,
vectors vij and the turbulent kinetic energy k, respectively.       B defines a linear map between the perturbation in the
   To account for the errors due to closure assumptions, the        barycentric triangle and the eigenvalue perturbations. With
tensor perturbation approach introduces perturbations into          the three vertices x1C , x2C , and x3C as the target states,
the modeled Reynolds stress during the CFD solution itera-          we have B −1 x1C = (2/3, −1/3, −1/3)T , B −1 x2C =
tions. This perturbed form is expressed as:                         (1/6, 1/6, −1/3)T , and B −1 x3C = (0, 0, 0)T . Figure 3
                             δij                                    shows the triangle in the barycentric map as well as one re-
                 ∗
                Rij = 2k ∗ (     + vin∗
                                        Λ∗nl vlj
                                              ∗
                                                 )            (3)   alizable location x coming from a RANS turbulence model.
                              3
                                                                    The eigenvalue perturbations add three perturbed simula-
where ∗ represents the perturbed quantities. Thus, k ∗ =            tions to the baseline calculation, one for each limiting state,
                                                        ∗
k + ∆k is the perturbed turbulent kinetic energy, vin     is the    leading to a total of four calculations. The uncertainty es-
perturbed eigenvector matrix, and, Λ∗nl is the diagonal ma-         timates are constructed by computing the range of values
trix of perturbed eigenvalues, λ∗l .                                across the four calculations. The minimum and maximum
   In this context, the eigenvalue perturbation can be ex-          values of the range form envelopes for any quantity of inter-
pressed as a sum of perturbations towards the 3 corners of          est.
the barycentric map. The corners of that triangle correspond           The framework of eigenvalue perturbations is applied to
to limiting states of turbulence with 1, 2, and 3 components,       the present test case. Figure 4 shows the resulting uncer-
respectively. The expression for the Reynolds stresses with         tainty envelopes, which cover the experimental results in
                                                   ∗      δ
only eigenvalue perturbations is given by Rij        = 2k( 3ij +    most locations. Unlike the mesh study and the inflow sensi-
      ∗                ∗
vin Λnl vlj ), where Λnl represents the diagonal matrix of per-     tivity study from the previous section, this analysis correctly
turbed eigenvalues. The perturbed eigenvalues can be ex-            indicates that there might be a region of flow recirculation at
pressed by the mapping λ∗l = B −1 x∗ . Here, x∗ = x +               the bottom wall. The uncertainty estimates, however, go be-
∆B (xt − x) is the representation of the perturbation in the        yond the experimental data, in some regions substantially,
barycentric triangle with x being the unperturbed state in          in other words seem to overestimate the modeling errors
                                                                                          tr(S)                        P
in some locations. This is expected because the perturba-                   #1       |tr(S)|+τt−1
                                                                                                      #7            |P |+
tions are targeting all possible extreme states of turbulence                            tr(S2 )                      k/
anisotropy without consideration of their plausibility. The                 #2      |tr(S2 )|+τt−2
                                                                                                      #8          |k/|+S −1
data free framework perturbs the Reynolds stresses every-                                tr(S3 )
                                                                            #3      |tr(S3 )|+τt−3
                                                                                                      #9             u/c0
where in the domain all the way to the respective limiting                               tr(R2 )
                                                                                                                    √
state. Yet, the Reynolds stress predictions of the turbulence               #4      |tr(R2 )|+τt−2
                                                                                                      #10             k/u
model do not have the same level of inaccuracy throughout                              tr(R2 S2 )
                                                                                                                  √
                                                                            #5    |tr(R2 S2 )|+τt−4
                                                                                                      #11   min( kdw /50ν, 4)
the domain.                                                                                                              √
                                                                                        W 2 −S 2
                                                                            #6          W 2 +S 2      #12    |gj sj |/g ∗ k/u
    Data driven Uncertainty Quantification
                                                                       Table 1: Non-dimensional features used for the random re-
We study a data driven approach to predict a local eigen-              gression forest. The following variables are used: the mean
value perturbation strength based on mean flow features.               rate of strain and rotation Sij = 12 (∇uij + ∇uji ), Wij =
Here, we define the local perturbation strength p as the dis-          1                                                 k
tance in barycentric coordinates between the unperturbed               2 (∇uij − ∇uji ); the turbulence time scale τt =  ; the unit
and the perturbed projection of the Reynolds stress. p is pre-         vector along the streamline si = ui /u; and the gradient of
                                                                                                               ∂u
dicted by a machine learning model using physically rele-              the streamline aligned velocity gi = sj ∂xji .
vant flow features fi as input. Figure 3 illustrates the mean-
ing of p in the barycentric map. The original location ~xLF
is perturbed towards the same extreme states as in the data                In many machine learning models one can vary the model
free approach, but now the perturbed locations, marked by              flexibility. A more flexible model is able to learn more com-
grey dots, are not more than p away from the original po-              plex relationships and will therefore reduce the bias of the
sition. In the example from the illustration, that means the           predictions. At the same time, a more flexible model in-
3-component limiting state is reached, while the perturba-             creases the likelihood of overfitting to the training data and
tions towards the 1- and 2-component limits are smaller. The           thereby of increasing the variance. The search for the opti-
perturbation strength p is directly related to the perturbation        mum model complexity to achieve both low bias and low
magnitude as: ∆B = min(p/dt , 1), where dt is the dis-                 variance is commonly referred to as the bias-variance trade-
tance in the barycentric map between the unperturbed state             off.
and the respective corner towards which it is perturbed. The               Binary decision trees are very flexible and tend to over-
perturbed locations are still always within the triangle and           fit strongly to the training data. Hence, they have a low bias
therefore within the constraints of realizability. This defini-        and a high variance. Random forests base their predictions
tion of the perturbation strength means that the effective per-        on a number of decorrelated decision trees. Decorrelation is
turbations cannot be greater than for the data free case, but          achieved by bagging, which is the training on random sub-
they can be smaller.                                                   sets of the training data, as well as randomly sampling the
   A random regression forest is chosen as the machine                 active variables at each split. Since the trees are decorrelated,
learning regression model. Random forests are a supervised             the variance of the random forest predictions is reduced and
learning algorithm. They are ensemble learners, meaning                generalization improved. At the same time, random forests
that they leverage a number of decorrelated simpler mod-               are able to keep the low bias of the decision trees. This
els to make a prediction. In this case of a random forest, the         makes random forests, despite their simplicity, powerful pre-
simpler models are regression trees (Breiman et al. 1984).             dictors for a range of applications (Breiman 2001). The ran-
Regression trees are able to learn non-linear functions. They          dom forest is implemented using the OpenCV library.
are also robust to extrapolation, since they cannot produce                In the present scenarion, i.e. an incompressible turbu-
predictions outside the range of the training data labels, and         lent flow, a set of twelve features was chosen. In order to
to uninformative features (Ling and Templeton 2015; Milani             be able to generalize to cases other than the training data
et al. 2017).                                                          set, all features are non-dimensional. The first eight features
   In machine learning models, the mean squared error can              were non-dimensionalized such that they lie within the inter-
be decomposed into the squared bias of the estimate, the               val [−1, 1]. The other ones were non-dimensional quantities
variance of the estimate, and the irreducible error:                   common in fluid mechanics, as well as a marker function in-
M SE(x) = (E[fˆ(x)] − f (x))2 + E[(fˆ(x) − E[fˆ(x)])2 ] + σe2 ,        dicating regions in which the turbulence model is expected
              |       {z         }   |     {z        }    |{z}
                  squared bias           variance        irred. err.   to be inaccurate (Gorlé et al. 2014). The computation of
where fˆ(x) is the model prediction and f (x) is the true la-          the features requires knowledge of the following variables,
bel. As the name suggests, the irreducible error stems from            which are either constant or solved for during the RANS cal-
noise in the data and cannot be reduced through the model.             culations: the mean velocity and its gradient, the turbulent
Bias is introduced through assumptions that are made in the            kinetic energy as well as its production and its dissipation
model before the training. The more flexible a model is,               rates, the minimum wall distance, the molecular viscosity,
the lower is its bias. Variance is related to generalization:          and the speed of sound. A list of features is given in table 1.
It measures how much the model predictions would change                    The periodic wavy wall case was used to obtain train-
if trained on different data. High variance indicates strong           ing data for the random regression forest model. It is de-
overfitting and poor generalization.                                   fined as a turbulent channel with a flat top wall and a sinu-
Figure 5: Training error (solid) and validation error (dotted)    Figure 7: Training error (solid) and validation error (dotted)
vs. number of trees for different maximum tree depth values.      vs. number of trees for different active variable counts.


Figure 6: Training error (solid) and validation error (dotted)
vs. number of trees for different minimum sample counts.

                                                                  Figure 8: Predictions vs. labels on wavy wall training data.
soidal bottom. The ratio of channel height to wave length is
H/λ = 1.0, and the ratio of wave height to wave length is
2A/λ = 0.1. The wave is repeating periodically, and on its        part. Only after making a choice on the hyperparameters, a
downward slope a flow separation occurs. An unperturbed           final random forest was trained on the full dataset to achieve
baseline calculation was run using the k− turbulence model       best performance when employed at the test case.
with periodic inflow and outflow boundary conditions. A              The results from the maximum tree depth study are shown
mesh convergence study, as done for the diffuser, suggested       in figure 5. The training and test errors are plotted in solid
only limited numerical errors. Features for the random for-       and dotted lines, respectively, against the number of trees.
est training were computed from the baseline case, and la-        The tested values are 5, 10, 15, and 20, and best perfor-
bels were computed from both the baseline case and higher         mance was achieved for 15 and 20. 15 was chosen, because
fidelity data. The labels are defined as the actual distances     a smaller value means smaller computational costs. Figure
in the barycentric domain between the location predicted          6 shows the results from the minimum sample count study.
by the baseline calculation and the higher fidelity one. The      The tested values are 10, 20, and 30. While the larger val-
higher fidelity data is DNS data from Rossi (2006).               ues did slightly better in terms of generalization, 10 over-
   Hyperparameters are parameters in machine learning             all showed the smallest error and was chosen as minimum
models that are not learned during model training, but that       sample count. The number of active variables was varied
are instead set before training and used to define the func-      between 2 and 10 at an increment of 2 as shown in figure
tional form of the model and control the learning process.        7. Larger numbers of active variables lead to lower training
The impact of four different hyperparameters on the learning      and test errors, with 8 and 10 yielding the smallest errors.
of the random regression forest model is studied: the maxi-          Only after making a choice on the hyperparameters, a
mum tree depth, the minimum sample count, the active vari-        random forest was trained on the full wavy wall dataset to
able count, and the number of trees. The minimum sample           achieve best performance when employed at the diffuser
count is the minimum number of samples required at a par-         case. Figure 8 shows a scatter plot of predictions vs. labels
ticular tree node in order to do further splitting. The active    for the training data. There is a good agreement between the
variable count is the number of features randomly chose at        predicted and the true perturbation strengths.
each node to find the optimal split.                                 The OpenCV library used to train the random regression
   For each of the first three hyperparameters a couple of dif-   forest allows for the computation of feature importance, i.e.
ferent values were tested over a range of 1 to 200 regression     a quantitative assessment of the impact that each feature has
trees. To improve readability, the figures 5 to 7 show the re-    on the final prediction.
sults for every third number of trees only. The dataset was          The maximal information coefficient (MIC) is a measure
split into 80% training set and 20% validation set for this       of dependence between variables. It is able to detect both lin-
                                                                   Figure 10: Data driven, local eigenvalue perturbation. Pro-
                                                                   files of streamwise velocity at different x locations.
Figure 9: Normalized random forest feature importance
scores (squares) and MIC scores (diamonds) for the features.

                                                                   the calculations associated with the Reynolds stress pertur-
                                                                   bations took some time; and second, the perturbations had
ear and more complex relationships, and it has shown good          an effect of the convergence of the solver, resulting in more
equitability (Reshef et al. 2016). We can estimate the MIC         iterations that had to be completed depending on the par-
between the features and the labels and compare the scores         ticular limiting state. The time of computing the Reynolds
to the feature importance scores from the random forest. The       stress perturbations was dominated by the evaluation time
MIC was estimated using the tools provided by Albanese             of the random forest, which scales linearly with the number
et al. (2018). Figure 9 presents the normalized feature im-        of trees, a number that potentially could be reduced.
portance scores from the random forest as squares and the             As for the data free uncertainty envelopes, three perturbed
estimated MIC scores as diamonds.                                  calculations were carried out for the three limiting states of
   Two of the most important features of the random for-           turbulence. Figure 10 shows the results. The uncertainty en-
est are #11 the non-dimensional wall distance and #12 the          velopes still display the same general trend, suggesting an
marker indicating deviation from parallel shear flow (Gorlé       overprediction of the streamwise velocity in the lower half
et al. 2012). The challenging flow features such as flow sep-      of the channel. As expected from permitting smaller pertur-
aration happen at or near the wall, supporting the signifi-        bation strengths, the envelopes are narrower than they are
cance of the wall distance. The marker function was devel-         when using the data free framework from the previous sec-
oped specifically to identify regions where the linear eddy        tion. There are no regions where the uncertainty is substan-
viscosity assumption becomes invalid. The importance of            tially overestimated: In most regions, the envelopes reach
this feature indicates that the model was able to recognize        just up to or at least very near to the experimental data. Thus,
this relationship. The features based on combinations of the       for the test case the data driven uncertainty estimates give a
mean rate of rotation were clearly more important than the         reasonable estimate of the modeling errors and therefore the
ones base on combinations of the mean rate of strain, with         true uncertainty in the flow predictions.
#4 the trace of the squared mean rotation rate tensor being
ranked as second most important feature. Another important                    Conclusion and Future Work
feature is #6 the Q criterion, identifying vortex regions. As
expected #1 the divergence of the velocity is not a signifi-       In this investigation, we outline a physics constrained data
cant feature for this incompressible flow case. It is important    driven framework for uncertainty quantification of turbu-
to point out that the feature importance is strongly related to    lence models. We outline a methodology that introduces
the baseline turbulence model.                                     physics constrained perturbations to estimate structural un-
   The feature importance scores show some trends that are         certainty in turbulence models while retaining the realiz-
also captured by the MIC, e.g. the ranking between the first       ability constraints on the Reynolds stresses. Thence, we uti-
five features. This increases our confidence in the learning of    lize machine learning algorithms to infer these perturbations
the machine learning model. At the same time we notice that        from labeled data. These two steps together ensure that this
there is no perfect agreement. For example, #10 the turbu-         framework is both physics constrained and data driven. Fi-
lence intensity was ranked first by the MIC while being not        nally, we integrate this library into CFD software and carry
as important to the random forest predictions as other fea-        out tests for robustness and reliability. At present, we are
tures as. This leaves room for more detailed investigations.       testing this framework using different baseline flows and
   Finally, this new, data driven framework was applied to         different machine learning algorithms. The software imple-
the planar asymmetric diffuser. The random forest model            mentation of this physics constrained machine learning li-
was used to predict a local perturbation strength at every         brary for turbulence model uncertainty quantification will be
cell during the RANS calculations. The data driven eigen-          released soon.
value perturbations lead to an increase in costs for the RANS
simulations. Compared to the calculations with the baseline                                References
k −  turbulence model, the observed runtime increases at a        Albanese, D.; Riccadonna, S.; Donati, C.; and Franceschi, P.
factor of 2 − 3. There are two reasons for this increase. First,   2018. A practical tool for maximal information coefficient
analysis. GigaScience 7(4). ISSN 2047-217X. doi:10.1093/           Turbine Simulations: A Benchmark Experimental and Nu-
gigascience/giy032. Giy032.                                        merical Study on Performance and Interstage Flow Behavior
Breiman, L. 2001. Random Forests. Mach. Learn. 45(1):              of High-Pressure Turbines at Design and Off-Design Con-
5–32. ISSN 0885-6125. doi:10.1023/A:1010933404324.                 ditions Using Two Different Turbine Designs. Journal of
                                                                   Turbomachinery 135(6). ISSN 0889-504X. doi:10.1115/1.
Breiman, L.; Friedman, J.; Stone, C. J.; and Olshen, R. A.         4024787. 061012.
1984. Classification and Regression Trees. Taylor & Francis.
ISBN 9780412048418.                                                Schumann, U. 1977. Realizability of Reynolds-stress turbu-
                                                                   lence models. The Physics of Fluids 20(5): 721–725.
Buice, C. U.; and Eaton, J. K. 2000. Experimental Investiga-
tion of Flow Through an Asymmetric Plane Diffuser: (Data           Wang, Q.; and Dow, E. A. 2010. Quantification of structural
Bank Contribution)1. Journal of Fluids Engineering 122(2):         uncertainties in the k-omega turbulence model. Center for
433–435. ISSN 0098-2202. doi:10.1115/1.483278.                     Turbulence Research Proceedings of the Summer Program .
Craft, T. J.; Launder, B. E.; and Suga, K. 1996. Develop-          Wu, J.-L.; Xiao, H.; and Paterson, E. 2018. Physics-informed
ment and application of a cubic eddy-viscosity model of tur-       machine learning approach for augmenting turbulence mod-
bulence. International Journal of Heat and Fluid Flow 17(2):       els: A comprehensive framework. Physical Review Fluids
108 – 115. ISSN 0142-727X. doi:https://doi.org/10.1016/            3(7). ISSN 2469-990X. doi:10.1103/physrevfluids.3.074602.
0142-727X(95)00079-6.
Duraisamy, K.; Iaccarino, G.; and Xiao, H. 2019. Turbulence
Modeling in the Age of Data. Annual Review of Fluid Me-
chanics 51(1): 357–377. doi:10.1146/annurev-fluid-010518-
040547. URL https://doi.org/10.1146/annurev-fluid-010518-
040547.
Gorlé, C.; Emory, M.; Larsson, J.; and Iaccarino, G. 2012.
Epistemic uncertainty quantification for RANS modeling of
the flow over a wavy wall. Center for Turbulence Research
Annual Research Briefs .
Gorlé, C.; Larsson, J.; Emory, M.; and Iaccarino, G. 2014.
The deviation from parallel shear flow as an indicator of linear
eddy-viscosity model inaccuracy. Physics of Fluids 26(5):
051702. doi:10.1063/1.4876577.
Hanjalić, K.; and Launder, B. E. 1972. A Reynolds stress
model of turbulence and its application to thin shear flows.
Journal of Fluid Mechanics 52(4): 609–638. doi:10.1017/
S002211207200268X.
Ling, J.; and Templeton, J. 2015.       Evaluation of ma-
chine learning algorithms for prediction of regions of high
Reynolds averaged Navier Stokes uncertainty. Physics of Flu-
ids 27(8): 085103. doi:10.1063/1.4927765.
Milani, P. M.; Ling, J.; Saez-Mischlich, G.; Bodart, J.; and
Eaton, J. K. 2017. A Machine Learning Approach for De-
termining the Turbulent Diffusivity in Film Cooling Flows.
Journal of Turbomachinery 140(2). ISSN 0889-504X. doi:
10.1115/1.4038275. 021006.
Obi, S.; Aoki, K.; and Masuda, S. 1993. Experimental and
Computational Study of Turbulent Separating Flow in an
Asymmetric Plane Diffuser. In 9th International Symposium
on Turbulent Shear Flows, 305. Kyoto, Japan.
Reshef, Y. A.; Reshef, D. N.; Finucane, H. K.; Sabeti, P. C.;
and Mitzenmacher, M. 2016. Measuring Dependence Power-
fully and Equitably. Journal of Machine Learning Research
17(211): 1–63. URL http://jmlr.org/papers/v17/15-308.html.
Rossi, R. 2006. Passive scalar transport in turbulent flows
over a wavy wall. Ph.D. thesis, Università degli Studi di
Bologna, Bologna, Italy.
Schobeiri, M. T.; and Abdelfattah, S. 2013. On the Reliability
of RANS and URANS Numerical Results for High-Pressure