A weighted sparse-input neural network technique applied to identify
                         important features for vortex-induced vibration
                               Leixin Ma, Themistocles L. Resvanis, J. Kim Vandiver
                               Department of Mechanical Engineering, Massachusetts Institute of Technology
                                                          leixinma@mit.edu


                            Abstract                                       To identify the key mechanisms and the governing di-
  Flow-induced vibration depends on a large number of pa-               mensionless parameters behind the complicated fluid-
  rameters or features. On the one hand, the number of candi-           structure interaction process, extensive investigations have
  date physical features may be too big to construct an inter-          been made through structural response measurement, flow
  pretable and transferrable model. On the other hand, failure
                                                                        visualization and various force modeling techniques
  to account for key dependence among features may over-
  simplify the model. Feature selection is found to be able to          (Sarpkaya 2004). VIV research over the past decades has
  reduce the dimension of the physical problem by identifying           revealed that Strouhal number, Reynolds number, mass
  the most important features for a certain prediction task. In         ratio, damping parameter etc. are all relevant VIV features
  this paper, a weighted sparse-input neural network                    (Vandiver 1993, Govardhan and Williamson 2006, Van-
  (WSPINN) is proposed, where the prior physical knowledge
                                                                        diver et.al. 2018). However, if one is only interested in
  is leveraged to constrain the neural network optimization.
  The effectiveness of this approach is evaluated when ap-              predicting a certain quantity of interest, such as cylinder’s
  plied to the vortex-induced vibration of a long flexible cyl-         vibration amplitude in the crossflow direction, some of
  inder with Reynolds number from 104 to 105. The important             these candidate features may be redundant or unimportant.
  physical features affecting the flexible cylinders’ crossflow            Feature selection algorithms are intended to extract the
  vibration amplitude are identified.
                                                                        most important features out of the full set of candidate fea-
                                                                        tures with the goal of keeping prediction accuracy at a de-
                         Introduction                                   sirable level, but with a reduced set of features. A pre-
                                                                        analysis of a features’ importance can be conducted by
Vortex-induced vibration (VIV) is a multi-physics problem               examining the statistical correlations among the features.
associated with a number of features (or variables) that                However, the statistical analysis often fails to consider the
characterize either the structure or the flow individually or           complicated interactions among the physical input parame-
their interaction. As flow passes around a cylinder, the                ters (features). To solve this problem, the importance of
wake becomes unstable. The periodically shed vortices                   each feature subset can be assessed according to their pre-
induce unsteady forces on the cylinder which lead to VIV.               diction accuracy using a learning machine, such as deep
Moreover, the VIV of long cylinders in ocean currents may               neural network (DNN). Several learning machine-based
vary from single mode dominated, narrow-band random                     feature selection approaches have already been developed
vibration to multi-mode response, characterized by broad-               to iteratively search the optimal feature subset that gives
band random vibration. Different current profiles may                   similar prediction accuracy as the full feature set, but they
cause structural vibration with standing waves or travelling            can be computationally expensive especially when the
wave patterns (Bourguet et.al. 2011; Vandiver et.al. 2018).             number of input features becomes very large (Guyon
The complexity of the nonlinear fluid-structure interaction             2003).
process, especially for the VIV of long, flexible cylinders                To efficiently identify important features in a learning
in high Reynolds numbers fluid flows, precludes exact                   machine, several regularization techniques are introduced
analytical solutions and CFD simulations are not yet up to              to the machine learning process. Rudy et. al. (2017) devel-
the task.                                                               oped a sequential threshold ridge regression, which helped
                                                                        discover governing partial differential equations of a sys-
                                                                        tem from measured time series. Inspired by the effective-
Copyright © 2020, for this paper by its authors. Use permitted under
                                                                        ness of the group lasso regularization in linear regression,
Creative Commons License Attribution 4.0 International (CC BY 4.0).     Feng (2017) and Scardapane (2017) developed a sparse-
input neural network by imposing group lasso regulariza-            the input feature and the neuron in the first hidden layer,
tion on the weight groups connecting each input neuron.             respectively. The magnitude of weight group for feature p
The effectiveness of the approach was demonstrated                  is measured by W p . Since the l0 norm is non-convex and
through theoretical derivations and empirical evidence.             non-differentiable, l1 norm, which calculates the sum of
   However, for physical problems, some of the system’s             absolute values of the vectors, is often used as a convex
properties may be known in advance or can be obtained               proxy (Tibshirani, 1996). It can be shown geometrically
from the governing physical laws and dimensional analysis           that l1 norm is the closest convex approximation for l0
(Sonin, 2001). Studies have shown that incorporating the            norm (Rosasco, 2010). Following the convex approxima-
prior physical knowledge can help build more interpretable          tion, we obtain Equation (1),
machine learning models (Ye et. al 2018).                                     1 N                              2 
                                                                                       (           )
                                                                                                           M
                                                                                                                   
   In this paper, the sparse-input neural network proposed               min  ∑ L y ( ) , yˆ ( ) + λ ∑ ∑ wpm 
                                                                           w    N
                                                                                         n     n
                                                                                                                  (     )  (2)
by Feng (2017) and Scardapane (2017) is modified to effi-            =         n 1          =       p∈P m 1      
ciently identify the important features on top of prior phys-
                                                                    The second term in Equation (2) introduces bias term for
ical information. Comparison with searching all combina-
                                                                    prediction. The hyperparameter λ is known as the group
tions of additional features shows its effectiveness in build-
                                                                    lasso penalty, which adjusts the sparsity of the input fea-
ing compact predictive models, while maintaining prior
                                                                    tures versus the prediction accuracy. When λ grows, the
physical information. The method was applied to the VIV
                                                                    neural network will try to minimize the sum of the weight
response amplitude prediction problem at dominant vibra-
                                                                    groups, and therefore more weight groups are likely to
tion frequencies. On top of the Reynolds number and
                                                                    shrink to near 0. The input features with nonzero weight
damping parameter, the in-line-cross-flow coupling and
                                                                    groups are the remaining features that contribute to the
modal participation are found to be important global VIV
                                                                    prediction. In this way, the model can be built out of fewer
features.
                                                                    input features, but the prediction accuracy may decrease
                                                                    due to the loss of information.
     Weighted sparse-input neural network
    (WSPINN) incorporating prior physical
                 knowledge
We consider a fully connected DNN with P input features
 x ∈ R P in the input layer and M neurons in the first hidden
layer that predict a certain target output y ∈ R1 . The weight      Figure 1: A DNN example with 3 inputs and 4 neurons in the first
                                                                                           hidden layer
connecting the pth input feature and mth neuron in the first
hidden layer is denoted as wpm. Figure 1 shows an example
of the DNN with P=3 and M=4. The sparse-input neural                   However, for many physical problems, some of the fea-
network (Feng 2017; Scardapane 2017) aims at accom-                 tures are known to be important in advance, which are
plishing two tasks simultaneously: On the one hand, it min-         termed as prior knowledge. In this case, the objective is to
imizes L ( y, yˆ ) , which is the prediction error (or loss)        select a small number of additional features that will com-
between the predicted ŷ and the measured y. Meanwhile,             plement the input features that are considered prior
it tries to constrain the number of input features to the           knowledge and lead to predictions of acceptable accuracy.
DNN to be no greater than k. To implement this constraint,          Since the conventional sparse-input neural network cannot
we need to group the weights outgoing from the same input           tell the difference between prior knowledge and additional
feature together, and then limit the number of non-zero             features, the optimization objective in Equation (2) needs
weight groups to be no larger than k. Hence, the mathemat-          to be modified as follows,
ical expression for the optimization objective can be ex-                                          M                  M      
                                                                                                                             2 
                                                                     min  L ( y, yˆ ) + s p λ ∑ ∑ ( wpm ) + sa λ ∑ ∑ ( wpm ) 
                                                                                                          2
                                                                                                                                    (3)
pressed as,                                                           w
                                                                                         =   p∈Pp m 1       =   p∈Pa m 1     
                       min L ( y, yˆ )
                       w                                            Where Pp denotes the feature set representing the prior
                      subject to                                    knowledge, and Pa denotes the set of all the additional fea-
                                                              (1)   tures. The parameters sp and sa are the weights assigned to
                        
                                  M                  
                                                                   the prior knowledge and additional features, respectively.
  w 0 { p : W p ≠ 0}=              ∑(w )
                                               2
  =                 ≤ k p:                        ≠ 0 ≤ k
                        
                                          pm
                                                      
                                                                    These weights represent the level of confidence on the fea-
                                  m =1               
                                                                    ture’ importance for prediction (Lian, 2018). The conven-
Where w 0 is the l0 norm of weight vector w. |…| is the             tional sparse-input neural network in Equation (2) is a spe-
cardinality of the weight groups; p and m are the index for         cial case for the weighted formulation in Equation (3),
where sa=sp=1, which assumes equal confidence for all the       at every instant Hence, the parameterization for VIV force
features’ importance. For the prior knowledge (i.e., feature    in the “power-in” region may involve,
set known to be important), we’d like to prevent the algo-           Fcf = f (U ( z ) , x ( z, t ) , y ( z, t ) , ρ , µ , L, L in , D, m ( z ) ,
rithm from minimizing their weight groups to near 0, hence                                                                                       (6)
                                                                            cs ,cf ( z ) , ch,cf ( z ) , P ( z, t ) , EI ( z ))
sp/sa should be set close to 0.
                                                                      Fil = f (U ( z ) , x ( z , t ) , y ( z , t ) , ρ , µ , L, L in , D, m ( z ) ,
                                                                                                                                                      (7)
 Relevant features for long flexible cylinders                                  cs ,il ( z ) , ch ,il ( z ) , P ( z , t ) , EI ( z ))

   subjected to vortex-induced vibrations                       Where ρ , µ , L and D are fluid density and dynamic
                                                                viscosity, cylinder’s length and diameter, respectively.
                                                                   If the spatiotemporal root-mean-square (rms) amplitude
Flexible cylinder VIV modeling
                                                                of crossflow vibration Arms,cf in the power-in region is the
Figure 2 is a sketch of a tensioned elastic cylinder under a    target output, then from Equations (4)-(7), the predictive
linearly sheared current profile U(z) distributed along axis    model can be expressed as,
z, which causes the cylinders’ vibration in both the inline         Arms ,cf = f (U ( z ) , x ( z , t ) , y ( z , t ) , ρ , µ , L, L in , D, m ( z ) ,
(IL) and crossflow (CF) directions with respect to the in-                                                                                             (8)
coming current. The vibration of the elastic cylinder can be                  cs ,cf ( z ) , ch ,cf ( z ) , P ( z , t ) , EI ( z ))
approximated as a tensioned Euler-Bernoulli beam. The              It can be observed that Equations (6)-(8) involves spa-
equation of motion in crossflow direction and inline direc-     tial-temporal distribution of structural response and system
tion can be expressed as,                                       properties, which will be further simplified and represented
         ∂2 y         ∂y            ∂2 y         ∂4 y           by some global VIV features.
  m ( z ) 2 + cs ( z ) − P ( z , t ) 2 + EI ( z ) 4 = Fcf (4)
         ∂t           ∂t            ∂z           ∂z
                                                                Spatial-temporal analysis for typical VIV
         ∂2 x         ∂x            ∂2 x         ∂4 x
  m ( z ) 2 + cs ( z ) − P ( z , t ) 2 + EI ( z ) 4 = Fil (5)   VIV measurements from the 2011 Shell experiments on a
          ∂t          ∂t            ∂z           ∂z
                                                                38-m-long cylinder (Lie et al 2013) were studied in this
                                                                investigation.
   Where x and y are the displacement in inline and cross-
                                                                   The measured crossflow displacements in a linearly
flow direction. m(z) is the cylinder’s mass per unit length,
                                                                sheared current are presented in Figure 3. The top figures
P(z,t) is tension of the vibrating cylinder, EI(z) represents
                                                                are the CF response time series at two locations within the
bending stiffness. cs is the structural damping coefficient
                                                                “power-in” region, while their corresponding wavelet
per unit length, Fcf and Fil are the vortex induced forces on
                                                                analysis is shown at the bottom. The vibration is found to
the cylinder. The loading transfers energy from fluid to the
                                                                be narrow-banded with the dominant frequency 𝜔𝜔𝑐𝑐𝑐𝑐 drift-
structure in a well-defined region with length Lin, which is
                                                                ing in time. Given the dominant vibration frequency and
the “power-in” region. Outside this region, the vortex load-
                                                                structural properties, the corresponding wavenumber k 𝑐𝑐𝑐𝑐
ing dissipates energy by transferring energy from the struc-
                                                                can be estimated by the dispersion relationship.
ture to the fluid through hydrodynamic damping coeffi-
cient ch(z). The location of the “power-in” region can be
identified from structural vibration measurements in exper-
iments or simulation (Rao 2015). Under steady-state, nar-
row-banded vibration, the total power dissipation in the
flexible pipe can be normalized to an equivalent damping
coefficient ce (Vandiver et.al. 2018).


     Figure 2: Side and front view of a cylinder under VIV
                                                                Figure 3: Top: Time series of crossflow displacement at two loca-
                                                                tions in the power-in region; Bottom: Wavelet analysis on the
   The VIV loading is the result of nonlinear interaction
                                                                measured time series. (2011 Shell experiment, D=30 mm, linearly
between vortex shedding and structural vibration via com-
                                                                sheared flow, Umax=1.6 m/s)
plicated feedback mechanisms that depend on the structur-
al properties, the current profile and the structure’s motion
   Meanwhile, Figure 4 shows the corresponding spatial-             Arms ,cf f (U rms , ∆U , Arms ,il , ωcf , ωil , k cf , k il , α cf , α il ,
                                                                    =
temporal distribution of crossflow displacement for the                                                                                                             (9)
                                                                                 κ cf , κ il , ρ , µ , L, L in , D, m, cs , ce,cf , ce,il , P0 , P, EI )
same test condition. The response is nonstationary, with a
mixture of standing wave and travelling wave components.          Where Urms and ∆U / L are the spatial root-mean-square
To better capture the temporal variation of the vibration         and the shear gradient of the current profile, respectively
signal, a moving window analysis is conducted. The vibra-         within the power-in region. Arms,cf and Arms,il are the spatio-
tion signal is windowed into overlapping time frames over         temporal rms for the crossflow and inline VIV amplitude
each 3 vibration cycles, with 75% overlap.                        in the power-in region. cs and ce are respectively, the struc-
   Complex proper orthogonal decomposition (POD) is               tural damping coefficient and the equivalent rigid cylinder
conducted on the crossflow displacement in each spatial-          damping coefficients that will lead to the same power dis-
temporal window in the “power-in” region to decompose             sipation as discussed by (Vandiver et.al. 2018). P0 and P
the displacement in each window into several orthogonal           are the initial tension before VIV and the mean tension
complex modes (Feeny 2008). The ratio between the mod-            during the VIV process, respectively.
al energy of the dominant POD mode and the total energy              Non-dimensionalizing Equation (9) gives,
is defined as κ , which suggests the dominance of the prin-          Acf* = f (Re, β , Ail* ,Vrcf ,Vril , Lk cf , α cf , α il , κ cf , κ il ,
cipal mode. Additionally, by comparing the real and imag-                                                                  P L2                                    (10)
                                                                                      L / L in , L / D, m*ζ , ccf* , cil* , 0             , P              )
inary component of the dominant complex mode, the trav-                                                                              EI         ( EIk cf2 )
elling wave index α can be defined, with α = 1 for travel-        Where Acf* = Arms , cf / D , Ail* = Arms ,il / D are the dimension-
ling waves, and α = 0 for standing waves (Feeny 2008).            less crossflow and inline response amplitude, respectively.
The middle and bottom of Figure 4 shows the temporal               Re = ρU rms D / µ is Reynolds number;
                                                                                                      =          β ( D / U rms )( ∆U / L )
variation of the travelling wave index and the modal domi-
nance factor analyzed in the power-in region, which sug-
                                                                  is known as the shear parameter; Vrcf = 2π U / ωcf D ,                                       (    )
                                                                  Vril = 2π U / (ωil D ) are the crossflow and inline reduced
gests that the VIV process is single POD mode dominated,
but the mode may vary from standing to travelling waves.
                                                                  velocities, respectively; m*ζ = 4mζ / πρ D 2 is known as            (            )
                                                                  the mass damping parameter in the VIV literature, which
Analysis from inline vibration also shows similar spatial-        historically has been thought to be important in controlling
temporal distribution.                                            rigid cylinder’s VIV amplitude. ccf* = 2ce , cf ωcf / ρU rms     2
                                                                                                                                     and
                                                                   cil = 2ce,il ωil / ρU rms are the dimensionless forms of the
                                                                    *                    2

                                                                  equivalent damping parameter in the crossflow and inline
                                                                  directions, respectively (Vandiver et.al. 2018).
                                                                      Although Equation (10) suggests that crossflow response
                                                                  prediction in the “power-in” region may require consider-
                                                                  ing the effect of all the 17 dimensionless variables, it is
                                                                  likely that the dimension of the input features can be fur-
                                                                  ther reduced due to redundancy or correlation between
                                                                  features or irrelevance to the prediction target. We are in-
                                                                  terested in finding a smaller and more manageable subset
                                                                  of parameters that are ultimately the most important out of
                                                                  the full set when it comes to determining the CF response
Figure 4: Contour plot of CF vibration around the dominant fre-   amplitude. The motivation behind this is our interest to
quency, its corresponding travelling wave index α cf , and mode   understand what causes the CF response variability that is
dominance factor κ cf in the estimated “power-in” region          observed in the temporal domain. At the very least, we
(z/L=0~0.3)                                                       would like to start associating changes to certain parame-
                                                                  ters with that variability that is often observed but is too
Dimensional analysis for narrow-banded VIV                        complicated to understand.
process
For a homogeneous, tensioned cylinder in uniform or line-
                                                                     Feature selection for flexible cylinder VIV
arly sheared current undergoing narrow-banded VIV,
Equation (8) can be approximated by the following rele-
vant global quantities,
                                                                  Dataset description
                                                                  The dataset is from a set of experiments conducted by
                                                                  Shell Oil Co. in 2011 at Marintek. The vibration of 38-
                                                                  meter-long cylinders under various current profiles were
                                                                  measured. The test matrix included two pipes with differ-
ent diameters (30-mm and 80-mm) but of the same bend-             two parameters are designated to be used as prior
ing stiffness. The cylinders were tested in uniform and lin-      knowledge. The shear parameter β which is ideally suited
early sheared current profiles with the maximum flow              to differentiating between uniform or sheared flows was
speed, Umax, ranging from 0.5 m/s to 2.5 m/s. This resulted       the third parameter that was chosen as prior knowledge
in the Reynolds number Re ranging from 1.0 ×104 –                 before starting the feature selections process.
20×104. The dataset also included cases where the 80-mm
pipe was covered with strakes over 50% of its length. The         Feature selection on top of prior physical
pipe tests were conducted in uniform flows with Umax vary-        knowledge
ing from 0.5 m/s to 1.5 m/s. The strakes dissipated vibra-
                                                                  The feature selection procedure was conducted by increas-
tion energy and limited the power-in region to Lin = 0.5L ,
                                                                  ing the hyperparameter λ from 0.01 until all the input fea-
50% of the cylinder’s length. Detailed descriptions of the
                                                                  tures except the prior knowledge shrank to 0. Figure 5
experiments can be found in Lie (2013) and Rao (2015).
                                                                  demonstrates how varying the value of λ determines the
   The structural damping ratio ζ in the experiment was
                                                                  number of features chosen by the proposed algorithm. In
around 0.5% (Vandiver et.al. 2018). The cross flow re-
                                                                  the figure the retained features are indicated by the pres-
duced velocity Vrcf varies in a narrow range from 6 to 9
                                                                  ence of a black bar at each λ value tested.
and Vrcf / Vril ≈ 2 .

Deep neural network setup
The deep neural network was constructed using two hidden
layers. Each hidden layer had twenty neurons using a sig-
moid activation function. The total number of data points
was around 6000. 70% of the experimental data were used
as the training data, while the rest was used as the test data.
The input variables x were standardized to keep the fea-
tures at the same scale, while the output variables y were
normalized to values between 0 to 1. The mean absolute
percentage error (MAPE) was chosen as the loss function
between prediction and measurement L ( y, yˆ ) . The neural
network optimization was conducted via FTRL algorithm
(McMahan 2013). During neural network training, the
batch size was 128 and learning rate was 0.01. The sa and
sp in Equation (3) were fixed to be 1 and 0.02, respectively.
After the optimization, we remove the input features whose        Figure 5: Top: The variation of remaining features with λ. The
magnitude of the weight groups have shrunk to near 0 from         black bars represent the features selected by the neural network.
prediction model. In this paper, the magnitude of a weight        Bottom: Comparison of the prediction error (MAPE) between
group is considered to be near 0 when it’s value is less than     weighted sparse-neural network (WSPINN) prediction and the
5% of the maximum magnitude among all of the input fea-           DNN prediction using combinatorically searched features under a
tures.                                                            given number of features

Prior physical knowledge for VIV                                     The prediction error varies with the retained features in
                                                                  the prediction model, which is presented in the bottom part
Experimental studies on small spring-mounted rigid cylin-
                                                                  of Figure 5. At each number of features, a brute force ap-
ders show that the response amplitude increases with in-
                                                                  proach that searches all the possible combinations of the
creasing Reynolds number in the range 103 to 104 and de-
                                                                  additional features is also carried out. The error obtained
creases as the dimensionless damping increases
                                                                  from the WSPINN is compared with hundreds runs of
(Govardhan & Williamson 2006, Vandiver 2012).
                                                                  DNN predictions using combinatorically searched features
   Similarly, studies on long flexible cylinders have shown
                                                                  in addition to the 3 features representing prior knowledge.
that the Reynolds number and the dimensionless damping
                                                                  The comparison suggests that the WSPINN is able to find
continue to play important roles on the VIV response am-
                                                                  the feature subsets that gives smallest prediction error
plitude (Resvanis 2012, Rao 2015) but as discussed earlier,
                                                                  among all the feature combinations. Besides, it can be ob-
the large number of potentially relevant parameters and the
                                                                  served that there could be multiple combinations of fea-
response variability result in scatter in the data.
                                                                  tures that give similar prediction accuracy. For example,
   Because it is known that Reynolds number Re and di-
                                                                  both the additional features Ail* , κ cf and Ail* , κ il gives predic-
mensionless damping parameter ccf* are important, these
tion error around 13%. This suggests the correlations and
interactions among some of the VIV features.
   After balancing the prediction accuracy with the sparsity
of input features, we find that the feature subset containing
5 features: Re, β , ccf* , Ail* , κ cf gives 13% MAPE, which is
close to 10.6% MAPE using all 17 features.
   We have also applied the WSPINN algorithm to other
VIV related problems, such as the prediction for rigid cyl-
inder’s VIV amplitude and flexible cylinders’ VIV ampli-
tude at higher harmonics etc. Because of space limitations
we cannot demonstrate this here. Moreover, since this pa-
per only studied the important parameters for VIV sheared         Figure 6: Contours of CF RMS amplitude as a function of ccf*
and uniform current profiles, the importance of the features      and Re (in uniform flow)
may be different for more complicated current profiles.


          Physical insight interpretation
The importance of the identified features for flexible cylin-
der VIV can be examined by systematically varying the
ranges of input features to the constructed neural network
models. Figure 6 and Figure 7 show the effect of varying
 Re, ccf* while constraining the other variables in the predic-
tion model to characteristic values most often observed in
the Shell experiments. The black dots are the experimental
measurements within 20% from the referenced values and
are included to demonstrate that the prediction model (con-       Figure 7: Contours of CF RMS amplitude as a function of ccf*
tours) did in fact have data in that vicinity.                    and Re (in sheared flow)
   The results demonstrate that increasing Reynolds num-
ber tends to increase the spatiotemporal CF RMS ampli-
tude. This Reynolds number effect is obvious in the uni-
form flow data which typically has small dimensionless
damping values ( ccf* <0.3-0.4). While the Reynolds number
effect is virtually non-existent when looking at the sheared
flow cases with damping parameters ( ccf* >0.4).
   Figure 8 shows the effect of varying Ail* and κ cf while
constraining the other variables. It can be found that the
crossflow response tends to increase with inline response.
Such a relationship has also been observed in spring-
mounted rigid cylinder’s VIV experiments (Dahl 2008),
where the fluctuating inline force increased with crossflow       Figure 8: Contours of CF RMS amplitude as a function of Ail*
motion. Finally, the prediction model suggests that as the        and κ cf (in sheared flow)
mode-participation factor increases so does the CF re-
sponse amplitude. Note that both standing wave and travel-
ling wave response can result in high mode-participation           Special properties of the approach compared
factors and in this situation, the factor primarily character-              to other machine learning
izes whether all points on the flexible cylinder are respond-
ing in a similar manner (spanwise coherence).                     1. Direct and learning task dependent dimension reduction
                                                                  in the original feature space while retaining the prior in-
                                                                  formation in the model.
                                                                     The WSPINN is one of the dimension reduction ap-
                                                                  proaches. However, different from widely used PCA or
                                                                  auto-encoders, WSPINN seeks to reduce the dimension
                                                                  directly in the original input feature space. Moreover,
                                                                  through machine learning prediction, WSPINN is able to
identify most important input feature with respect to the                                      References
target output. The much smaller constraints placed on the
                                                                      Bourguet, R., Karniadakis, G., Triantafyllou, M., 2011. Vortex-
prior knowledge also allows the prior knowledge to retain
                                                                      induced vibrations of a long flexible cylinder in shear flow. Jour-
in the prediction model to improve prediction and also                nal of Fluid Mechanics 677:342-382.
identify additional important features.                               Dahl, J. J. M. 2008. Vortex-induced vibration of a circular cylin-
2. High prediction accuracy due to the universal approxi-             der with combined in-line and cross-flow motion (Doctoral dis-
mation property of the DNN (Hornik, 1993)                             sertation, Massachusetts Institute of Technology).
     The WSPINN is a feature selection approach embed-                Feeny, B.F., 2008. A complex orthogonal decomposition for
ded in DNN, which is able to predict nonlinear input-                 wave motion analysis. Journal of Sound and Vibration, 310(1-2).
output relationships accurately. For instance, for the cross-         77-90.
flow VIV amplitude prediction, the prediction accuracy                Feng, J. and Simon, N., 2017. Sparse-input neural networks for
from the DNN and linear regression given Re, β , ccf* , Ail* , κ cf   high-dimensional nonparametric regression and classification.
                                                                      arXiv preprint. arXiv:1711.07592.
are 13% and 25%, respectively. However, training DNN
                                                                      Govardhan, R.N. and Williamson, C.H.K., 2006. Defining the
with WSPINN requires several rounds of iterations to op-
                                                                      ‘modified Griffin plot’in vortex-induced vibration: revealing the
timize the weights in each layer, hence it was found to be            effect of Reynolds number using controlled damping. Journal of
more computationally expensive than most of the other                 fluid mechanics. 561, 147-180.
machine learning methods We consider the computational                Guyon, I. and Elisseeff, A., 2003. An introduction to variable and
cost acceptable since our intention is not to create a fast           feature selection. Journal of machine learning research,
predictive tool but rather to use machine learning to reduce          3(Mar).1157-1182.
the dimensionality of the problem as we try to understand             Hornik, K., 1993. Some new results on neural network approxi-
the importance of each of the many governing parameters.              mation. Neural networks, 6(8), 1069-1072.
                                                                      Lian, L., Liu, A. and Lau, V.K., 2018. Weighted LASSO for
                                                                      sparse recovery with statistical prior support information. IEEE
                         Conclusion                                   Transactions on Signal Processing, 66(6). 1607-1618.
                                                                      Lie, H. et.al., 2013, August. Comprehensive riser VIV model tests
In this paper, we modify and propose changes to a sparse-             in uniform and sheared flow. In ASME 2012 31st International
input neural network so it can efficiently select additional          Conference on Ocean, Offshore and Arctic Engineering. 923-930.
features which can complement a subset of features known              McMahan, H.B. et.al., 2013, August. Ad click prediction: a view
to be important in advance (i.e. prior knowledge). The al-            from the trenches. In Proceedings of the 19th ACM SIGKDD
gorithm was applied to the experimental results from vor-             international conference on Knowledge discovery and data min-
                                                                      ing.1222-1230. ACM.
tex-induced vibration of flexible cylinders. The complicat-
ed spatiotemporal response measurements of the continu-               Rao, Z., 2015. The flow of power in the vortex-induced vibration
                                                                      of flexible cylinders. Ph. D. Dissertation, Department of Mechan-
ous system are reduced to an equivalent 2 Degree of Free-             ical Engineering, Massachusetts Institute of Technology, Cam-
dom system. The proposed algorithm is then used to inves-             bridge, MA.
tigate the role of Reynolds number, damping parameter,                Resvanis, T.L. et.al., 2012, July. Reynolds number effects on the
and shear parameter (3 parameters for which we have prior             vortex-induced vibration of flexible marine risers. In ASME 2012
knowledge), as well as 14 other parameters that the dimen-            31st International Conference on Ocean, Offshore and Arctic
sional analysis indicated might be important. The algo-               Engineering.751-760.
rithm was able to reduce the 14 additional parameters to              Rosasco, 2010. Statistical Learning Theory and Applications.
just 2 additional parameters on top of the prior knowledge.           Lecture notes. Massachusetts Institute of Technology.
We found that this feature selection technique is much                Rudy, S.H., Brunton, S.L., Proctor, J.L. and Kutz, J.N., 2017.
                                                                      Data-driven discovery of partial differential equations. Science
more efficient than a brute force combinatorial search.               Advances, 3(4), p.e1602614.
                                                                      Sarpkaya, T. 2004. A critical review of the intrinsic nature of
                                                                      vortex-induced vibrations. Journal of fluids and structures, 19(4):
                    Acknowledgement                                   389-447.
This research has been sponsored by the members of the                Scardapane, S., Comminiello, D., Hussain, A. and Uncini, A.,
SHEAR7 Joint Industry Project: BP, Chevron, ExxonMo-                  2017. Group sparse regularization for deep neural networks. Neu-
bil, Petrobras, SBM Offshore, Shell International Explora-            rocomputing, 241: 81-89.
tion and Production, Equinor, & Technip USA.                          Sonin, A.A., 2001. Dimensional analysis. Technical report, Mas-
                                                                      sachusetts Institute of Technology.
                                                                      Tibshirani, R., 1996. Regression shrinkage and selection via the
                                                                      lasso. Journal of the Royal Statistical Society: Series B (Methodo-
                                                                      logical), 58(1), pp.267-288.
Vandiver, J.K., 1993. Dimensionless parameters important to the
prediction of vortex-induced vibration of long, flexible cylinders
in ocean currents. Journal of Fluids and Structures, 7(5), 423-455.
Vandiver, J.K., 2012. Damping parameters for flow-induced vi-
bration. Journal of fluids and structures, 35.105-119.
Vandiver, J.K., Ma, L. and Rao, Z., 2018. Revealing the effects of
damping on the flow-induced vibration of flexible cylinders.
Journal of Sound and Vibration, 433: 29-54.
Ye, T., Wang, X., Davidson, J. and Gupta, A., 2018. Interpretable
intuitive physics model. In Proceedings of the European Confer-
ence on Computer Vision (ECCV). 87-102.