A weighted sparse-input neural network technique applied to identify important features for vortex-induced vibration Leixin Ma, Themistocles L. Resvanis, J. Kim Vandiver Department of Mechanical Engineering, Massachusetts Institute of Technology leixinma@mit.edu Abstract To identify the key mechanisms and the governing di- Flow-induced vibration depends on a large number of pa- mensionless parameters behind the complicated fluid- rameters or features. On the one hand, the number of candi- structure interaction process, extensive investigations have date physical features may be too big to construct an inter- been made through structural response measurement, flow pretable and transferrable model. On the other hand, failure visualization and various force modeling techniques to account for key dependence among features may over- simplify the model. Feature selection is found to be able to (Sarpkaya 2004). VIV research over the past decades has reduce the dimension of the physical problem by identifying revealed that Strouhal number, Reynolds number, mass the most important features for a certain prediction task. In ratio, damping parameter etc. are all relevant VIV features this paper, a weighted sparse-input neural network (Vandiver 1993, Govardhan and Williamson 2006, Van- (WSPINN) is proposed, where the prior physical knowledge diver et.al. 2018). However, if one is only interested in is leveraged to constrain the neural network optimization. The effectiveness of this approach is evaluated when ap- predicting a certain quantity of interest, such as cylinder’s plied to the vortex-induced vibration of a long flexible cyl- vibration amplitude in the crossflow direction, some of inder with Reynolds number from 104 to 105. The important these candidate features may be redundant or unimportant. physical features affecting the flexible cylinders’ crossflow Feature selection algorithms are intended to extract the vibration amplitude are identified. most important features out of the full set of candidate fea- tures with the goal of keeping prediction accuracy at a de- Introduction sirable level, but with a reduced set of features. A pre- analysis of a features’ importance can be conducted by Vortex-induced vibration (VIV) is a multi-physics problem examining the statistical correlations among the features. associated with a number of features (or variables) that However, the statistical analysis often fails to consider the characterize either the structure or the flow individually or complicated interactions among the physical input parame- their interaction. As flow passes around a cylinder, the ters (features). To solve this problem, the importance of wake becomes unstable. The periodically shed vortices each feature subset can be assessed according to their pre- induce unsteady forces on the cylinder which lead to VIV. diction accuracy using a learning machine, such as deep Moreover, the VIV of long cylinders in ocean currents may neural network (DNN). Several learning machine-based vary from single mode dominated, narrow-band random feature selection approaches have already been developed vibration to multi-mode response, characterized by broad- to iteratively search the optimal feature subset that gives band random vibration. Different current profiles may similar prediction accuracy as the full feature set, but they cause structural vibration with standing waves or travelling can be computationally expensive especially when the wave patterns (Bourguet et.al. 2011; Vandiver et.al. 2018). number of input features becomes very large (Guyon The complexity of the nonlinear fluid-structure interaction 2003). process, especially for the VIV of long, flexible cylinders To efficiently identify important features in a learning in high Reynolds numbers fluid flows, precludes exact machine, several regularization techniques are introduced analytical solutions and CFD simulations are not yet up to to the machine learning process. Rudy et. al. (2017) devel- the task. oped a sequential threshold ridge regression, which helped discover governing partial differential equations of a sys- tem from measured time series. Inspired by the effective- Copyright © 2020, for this paper by its authors. Use permitted under ness of the group lasso regularization in linear regression, Creative Commons License Attribution 4.0 International (CC BY 4.0). Feng (2017) and Scardapane (2017) developed a sparse- input neural network by imposing group lasso regulariza- the input feature and the neuron in the first hidden layer, tion on the weight groups connecting each input neuron. respectively. The magnitude of weight group for feature p The effectiveness of the approach was demonstrated is measured by W p . Since the l0 norm is non-convex and through theoretical derivations and empirical evidence. non-differentiable, l1 norm, which calculates the sum of However, for physical problems, some of the system’s absolute values of the vectors, is often used as a convex properties may be known in advance or can be obtained proxy (Tibshirani, 1996). It can be shown geometrically from the governing physical laws and dimensional analysis that l1 norm is the closest convex approximation for l0 (Sonin, 2001). Studies have shown that incorporating the norm (Rosasco, 2010). Following the convex approxima- prior physical knowledge can help build more interpretable tion, we obtain Equation (1), machine learning models (Ye et. al 2018).  1 N 2  ( ) M  In this paper, the sparse-input neural network proposed min  ∑ L y ( ) , yˆ ( ) + λ ∑ ∑ wpm  w N n n ( ) (2) by Feng (2017) and Scardapane (2017) is modified to effi- =  n 1 = p∈P m 1  ciently identify the important features on top of prior phys- The second term in Equation (2) introduces bias term for ical information. Comparison with searching all combina- prediction. The hyperparameter λ is known as the group tions of additional features shows its effectiveness in build- lasso penalty, which adjusts the sparsity of the input fea- ing compact predictive models, while maintaining prior tures versus the prediction accuracy. When λ grows, the physical information. The method was applied to the VIV neural network will try to minimize the sum of the weight response amplitude prediction problem at dominant vibra- groups, and therefore more weight groups are likely to tion frequencies. On top of the Reynolds number and shrink to near 0. The input features with nonzero weight damping parameter, the in-line-cross-flow coupling and groups are the remaining features that contribute to the modal participation are found to be important global VIV prediction. In this way, the model can be built out of fewer features. input features, but the prediction accuracy may decrease due to the loss of information. Weighted sparse-input neural network (WSPINN) incorporating prior physical knowledge We consider a fully connected DNN with P input features x ∈ R P in the input layer and M neurons in the first hidden layer that predict a certain target output y ∈ R1 . The weight Figure 1: A DNN example with 3 inputs and 4 neurons in the first hidden layer connecting the pth input feature and mth neuron in the first hidden layer is denoted as wpm. Figure 1 shows an example of the DNN with P=3 and M=4. The sparse-input neural However, for many physical problems, some of the fea- network (Feng 2017; Scardapane 2017) aims at accom- tures are known to be important in advance, which are plishing two tasks simultaneously: On the one hand, it min- termed as prior knowledge. In this case, the objective is to imizes L ( y, yˆ ) , which is the prediction error (or loss) select a small number of additional features that will com- between the predicted ŷ and the measured y. Meanwhile, plement the input features that are considered prior it tries to constrain the number of input features to the knowledge and lead to predictions of acceptable accuracy. DNN to be no greater than k. To implement this constraint, Since the conventional sparse-input neural network cannot we need to group the weights outgoing from the same input tell the difference between prior knowledge and additional feature together, and then limit the number of non-zero features, the optimization objective in Equation (2) needs weight groups to be no larger than k. Hence, the mathemat- to be modified as follows, ical expression for the optimization objective can be ex-  M M  2  min  L ( y, yˆ ) + s p λ ∑ ∑ ( wpm ) + sa λ ∑ ∑ ( wpm )  2 (3) pressed as, w  = p∈Pp m 1 = p∈Pa m 1  min L ( y, yˆ ) w Where Pp denotes the feature set representing the prior subject to knowledge, and Pa denotes the set of all the additional fea- (1) tures. The parameters sp and sa are the weights assigned to   M   the prior knowledge and additional features, respectively. w 0 { p : W p ≠ 0}= ∑(w ) 2 = ≤ k p: ≠ 0 ≤ k  pm  These weights represent the level of confidence on the fea-  m =1  ture’ importance for prediction (Lian, 2018). The conven- Where w 0 is the l0 norm of weight vector w. |…| is the tional sparse-input neural network in Equation (2) is a spe- cardinality of the weight groups; p and m are the index for cial case for the weighted formulation in Equation (3), where sa=sp=1, which assumes equal confidence for all the at every instant Hence, the parameterization for VIV force features’ importance. For the prior knowledge (i.e., feature in the “power-in” region may involve, set known to be important), we’d like to prevent the algo- Fcf = f (U ( z ) , x ( z, t ) , y ( z, t ) , ρ , µ , L, L in , D, m ( z ) , rithm from minimizing their weight groups to near 0, hence (6) cs ,cf ( z ) , ch,cf ( z ) , P ( z, t ) , EI ( z )) sp/sa should be set close to 0. Fil = f (U ( z ) , x ( z , t ) , y ( z , t ) , ρ , µ , L, L in , D, m ( z ) , (7) Relevant features for long flexible cylinders cs ,il ( z ) , ch ,il ( z ) , P ( z , t ) , EI ( z )) subjected to vortex-induced vibrations Where ρ , µ , L and D are fluid density and dynamic viscosity, cylinder’s length and diameter, respectively. If the spatiotemporal root-mean-square (rms) amplitude Flexible cylinder VIV modeling of crossflow vibration Arms,cf in the power-in region is the Figure 2 is a sketch of a tensioned elastic cylinder under a target output, then from Equations (4)-(7), the predictive linearly sheared current profile U(z) distributed along axis model can be expressed as, z, which causes the cylinders’ vibration in both the inline Arms ,cf = f (U ( z ) , x ( z , t ) , y ( z , t ) , ρ , µ , L, L in , D, m ( z ) , (IL) and crossflow (CF) directions with respect to the in- (8) coming current. The vibration of the elastic cylinder can be cs ,cf ( z ) , ch ,cf ( z ) , P ( z , t ) , EI ( z )) approximated as a tensioned Euler-Bernoulli beam. The It can be observed that Equations (6)-(8) involves spa- equation of motion in crossflow direction and inline direc- tial-temporal distribution of structural response and system tion can be expressed as, properties, which will be further simplified and represented ∂2 y ∂y ∂2 y ∂4 y by some global VIV features. m ( z ) 2 + cs ( z ) − P ( z , t ) 2 + EI ( z ) 4 = Fcf (4) ∂t ∂t ∂z ∂z Spatial-temporal analysis for typical VIV ∂2 x ∂x ∂2 x ∂4 x m ( z ) 2 + cs ( z ) − P ( z , t ) 2 + EI ( z ) 4 = Fil (5) VIV measurements from the 2011 Shell experiments on a ∂t ∂t ∂z ∂z 38-m-long cylinder (Lie et al 2013) were studied in this investigation. Where x and y are the displacement in inline and cross- The measured crossflow displacements in a linearly flow direction. m(z) is the cylinder’s mass per unit length, sheared current are presented in Figure 3. The top figures P(z,t) is tension of the vibrating cylinder, EI(z) represents are the CF response time series at two locations within the bending stiffness. cs is the structural damping coefficient “power-in” region, while their corresponding wavelet per unit length, Fcf and Fil are the vortex induced forces on analysis is shown at the bottom. The vibration is found to the cylinder. The loading transfers energy from fluid to the be narrow-banded with the dominant frequency 𝜔𝜔𝑐𝑐𝑐𝑐 drift- structure in a well-defined region with length Lin, which is ing in time. Given the dominant vibration frequency and the “power-in” region. Outside this region, the vortex load- structural properties, the corresponding wavenumber k 𝑐𝑐𝑐𝑐 ing dissipates energy by transferring energy from the struc- can be estimated by the dispersion relationship. ture to the fluid through hydrodynamic damping coeffi- cient ch(z). The location of the “power-in” region can be identified from structural vibration measurements in exper- iments or simulation (Rao 2015). Under steady-state, nar- row-banded vibration, the total power dissipation in the flexible pipe can be normalized to an equivalent damping coefficient ce (Vandiver et.al. 2018). Figure 2: Side and front view of a cylinder under VIV Figure 3: Top: Time series of crossflow displacement at two loca- tions in the power-in region; Bottom: Wavelet analysis on the The VIV loading is the result of nonlinear interaction measured time series. (2011 Shell experiment, D=30 mm, linearly between vortex shedding and structural vibration via com- sheared flow, Umax=1.6 m/s) plicated feedback mechanisms that depend on the structur- al properties, the current profile and the structure’s motion Meanwhile, Figure 4 shows the corresponding spatial- Arms ,cf f (U rms , ∆U , Arms ,il , ωcf , ωil , k cf , k il , α cf , α il , = temporal distribution of crossflow displacement for the (9) κ cf , κ il , ρ , µ , L, L in , D, m, cs , ce,cf , ce,il , P0 , P, EI ) same test condition. The response is nonstationary, with a mixture of standing wave and travelling wave components. Where Urms and ∆U / L are the spatial root-mean-square To better capture the temporal variation of the vibration and the shear gradient of the current profile, respectively signal, a moving window analysis is conducted. The vibra- within the power-in region. Arms,cf and Arms,il are the spatio- tion signal is windowed into overlapping time frames over temporal rms for the crossflow and inline VIV amplitude each 3 vibration cycles, with 75% overlap. in the power-in region. cs and ce are respectively, the struc- Complex proper orthogonal decomposition (POD) is tural damping coefficient and the equivalent rigid cylinder conducted on the crossflow displacement in each spatial- damping coefficients that will lead to the same power dis- temporal window in the “power-in” region to decompose sipation as discussed by (Vandiver et.al. 2018). P0 and P the displacement in each window into several orthogonal are the initial tension before VIV and the mean tension complex modes (Feeny 2008). The ratio between the mod- during the VIV process, respectively. al energy of the dominant POD mode and the total energy Non-dimensionalizing Equation (9) gives, is defined as κ , which suggests the dominance of the prin- Acf* = f (Re, β , Ail* ,Vrcf ,Vril , Lk cf , α cf , α il , κ cf , κ il , cipal mode. Additionally, by comparing the real and imag- P L2 (10) L / L in , L / D, m*ζ , ccf* , cil* , 0 , P ) inary component of the dominant complex mode, the trav- EI ( EIk cf2 ) elling wave index α can be defined, with α = 1 for travel- Where Acf* = Arms , cf / D , Ail* = Arms ,il / D are the dimension- ling waves, and α = 0 for standing waves (Feeny 2008). less crossflow and inline response amplitude, respectively. The middle and bottom of Figure 4 shows the temporal Re = ρU rms D / µ is Reynolds number; = β ( D / U rms )( ∆U / L ) variation of the travelling wave index and the modal domi- nance factor analyzed in the power-in region, which sug- is known as the shear parameter; Vrcf = 2π U / ωcf D , ( ) Vril = 2π U / (ωil D ) are the crossflow and inline reduced gests that the VIV process is single POD mode dominated, but the mode may vary from standing to travelling waves. velocities, respectively; m*ζ = 4mζ / πρ D 2 is known as ( ) the mass damping parameter in the VIV literature, which Analysis from inline vibration also shows similar spatial- historically has been thought to be important in controlling temporal distribution. rigid cylinder’s VIV amplitude. ccf* = 2ce , cf ωcf / ρU rms 2 and cil = 2ce,il ωil / ρU rms are the dimensionless forms of the * 2 equivalent damping parameter in the crossflow and inline directions, respectively (Vandiver et.al. 2018). Although Equation (10) suggests that crossflow response prediction in the “power-in” region may require consider- ing the effect of all the 17 dimensionless variables, it is likely that the dimension of the input features can be fur- ther reduced due to redundancy or correlation between features or irrelevance to the prediction target. We are in- terested in finding a smaller and more manageable subset of parameters that are ultimately the most important out of the full set when it comes to determining the CF response Figure 4: Contour plot of CF vibration around the dominant fre- amplitude. The motivation behind this is our interest to quency, its corresponding travelling wave index α cf , and mode understand what causes the CF response variability that is dominance factor κ cf in the estimated “power-in” region observed in the temporal domain. At the very least, we (z/L=0~0.3) would like to start associating changes to certain parame- ters with that variability that is often observed but is too Dimensional analysis for narrow-banded VIV complicated to understand. process For a homogeneous, tensioned cylinder in uniform or line- Feature selection for flexible cylinder VIV arly sheared current undergoing narrow-banded VIV, Equation (8) can be approximated by the following rele- vant global quantities, Dataset description The dataset is from a set of experiments conducted by Shell Oil Co. in 2011 at Marintek. The vibration of 38- meter-long cylinders under various current profiles were measured. The test matrix included two pipes with differ- ent diameters (30-mm and 80-mm) but of the same bend- two parameters are designated to be used as prior ing stiffness. The cylinders were tested in uniform and lin- knowledge. The shear parameter β which is ideally suited early sheared current profiles with the maximum flow to differentiating between uniform or sheared flows was speed, Umax, ranging from 0.5 m/s to 2.5 m/s. This resulted the third parameter that was chosen as prior knowledge in the Reynolds number Re ranging from 1.0 ×104 – before starting the feature selections process. 20×104. The dataset also included cases where the 80-mm pipe was covered with strakes over 50% of its length. The Feature selection on top of prior physical pipe tests were conducted in uniform flows with Umax vary- knowledge ing from 0.5 m/s to 1.5 m/s. The strakes dissipated vibra- The feature selection procedure was conducted by increas- tion energy and limited the power-in region to Lin = 0.5L , ing the hyperparameter λ from 0.01 until all the input fea- 50% of the cylinder’s length. Detailed descriptions of the tures except the prior knowledge shrank to 0. Figure 5 experiments can be found in Lie (2013) and Rao (2015). demonstrates how varying the value of λ determines the The structural damping ratio ζ in the experiment was number of features chosen by the proposed algorithm. In around 0.5% (Vandiver et.al. 2018). The cross flow re- the figure the retained features are indicated by the pres- duced velocity Vrcf varies in a narrow range from 6 to 9 ence of a black bar at each λ value tested. and Vrcf / Vril ≈ 2 . Deep neural network setup The deep neural network was constructed using two hidden layers. Each hidden layer had twenty neurons using a sig- moid activation function. The total number of data points was around 6000. 70% of the experimental data were used as the training data, while the rest was used as the test data. The input variables x were standardized to keep the fea- tures at the same scale, while the output variables y were normalized to values between 0 to 1. The mean absolute percentage error (MAPE) was chosen as the loss function between prediction and measurement L ( y, yˆ ) . The neural network optimization was conducted via FTRL algorithm (McMahan 2013). During neural network training, the batch size was 128 and learning rate was 0.01. The sa and sp in Equation (3) were fixed to be 1 and 0.02, respectively. After the optimization, we remove the input features whose Figure 5: Top: The variation of remaining features with λ. The magnitude of the weight groups have shrunk to near 0 from black bars represent the features selected by the neural network. prediction model. In this paper, the magnitude of a weight Bottom: Comparison of the prediction error (MAPE) between group is considered to be near 0 when it’s value is less than weighted sparse-neural network (WSPINN) prediction and the 5% of the maximum magnitude among all of the input fea- DNN prediction using combinatorically searched features under a tures. given number of features Prior physical knowledge for VIV The prediction error varies with the retained features in the prediction model, which is presented in the bottom part Experimental studies on small spring-mounted rigid cylin- of Figure 5. At each number of features, a brute force ap- ders show that the response amplitude increases with in- proach that searches all the possible combinations of the creasing Reynolds number in the range 103 to 104 and de- additional features is also carried out. The error obtained creases as the dimensionless damping increases from the WSPINN is compared with hundreds runs of (Govardhan & Williamson 2006, Vandiver 2012). DNN predictions using combinatorically searched features Similarly, studies on long flexible cylinders have shown in addition to the 3 features representing prior knowledge. that the Reynolds number and the dimensionless damping The comparison suggests that the WSPINN is able to find continue to play important roles on the VIV response am- the feature subsets that gives smallest prediction error plitude (Resvanis 2012, Rao 2015) but as discussed earlier, among all the feature combinations. Besides, it can be ob- the large number of potentially relevant parameters and the served that there could be multiple combinations of fea- response variability result in scatter in the data. tures that give similar prediction accuracy. For example, Because it is known that Reynolds number Re and di- both the additional features Ail* , κ cf and Ail* , κ il gives predic- mensionless damping parameter ccf* are important, these tion error around 13%. This suggests the correlations and interactions among some of the VIV features. After balancing the prediction accuracy with the sparsity of input features, we find that the feature subset containing 5 features: Re, β , ccf* , Ail* , κ cf gives 13% MAPE, which is close to 10.6% MAPE using all 17 features. We have also applied the WSPINN algorithm to other VIV related problems, such as the prediction for rigid cyl- inder’s VIV amplitude and flexible cylinders’ VIV ampli- tude at higher harmonics etc. Because of space limitations we cannot demonstrate this here. Moreover, since this pa- per only studied the important parameters for VIV sheared Figure 6: Contours of CF RMS amplitude as a function of ccf* and uniform current profiles, the importance of the features and Re (in uniform flow) may be different for more complicated current profiles. Physical insight interpretation The importance of the identified features for flexible cylin- der VIV can be examined by systematically varying the ranges of input features to the constructed neural network models. Figure 6 and Figure 7 show the effect of varying Re, ccf* while constraining the other variables in the predic- tion model to characteristic values most often observed in the Shell experiments. The black dots are the experimental measurements within 20% from the referenced values and are included to demonstrate that the prediction model (con- Figure 7: Contours of CF RMS amplitude as a function of ccf* tours) did in fact have data in that vicinity. and Re (in sheared flow) The results demonstrate that increasing Reynolds num- ber tends to increase the spatiotemporal CF RMS ampli- tude. This Reynolds number effect is obvious in the uni- form flow data which typically has small dimensionless damping values ( ccf* <0.3-0.4). While the Reynolds number effect is virtually non-existent when looking at the sheared flow cases with damping parameters ( ccf* >0.4). Figure 8 shows the effect of varying Ail* and κ cf while constraining the other variables. It can be found that the crossflow response tends to increase with inline response. Such a relationship has also been observed in spring- mounted rigid cylinder’s VIV experiments (Dahl 2008), where the fluctuating inline force increased with crossflow Figure 8: Contours of CF RMS amplitude as a function of Ail* motion. Finally, the prediction model suggests that as the and κ cf (in sheared flow) mode-participation factor increases so does the CF re- sponse amplitude. Note that both standing wave and travel- ling wave response can result in high mode-participation Special properties of the approach compared factors and in this situation, the factor primarily character- to other machine learning izes whether all points on the flexible cylinder are respond- ing in a similar manner (spanwise coherence). 1. Direct and learning task dependent dimension reduction in the original feature space while retaining the prior in- formation in the model. The WSPINN is one of the dimension reduction ap- proaches. However, different from widely used PCA or auto-encoders, WSPINN seeks to reduce the dimension directly in the original input feature space. Moreover, through machine learning prediction, WSPINN is able to identify most important input feature with respect to the References target output. The much smaller constraints placed on the Bourguet, R., Karniadakis, G., Triantafyllou, M., 2011. Vortex- prior knowledge also allows the prior knowledge to retain induced vibrations of a long flexible cylinder in shear flow. Jour- in the prediction model to improve prediction and also nal of Fluid Mechanics 677:342-382. identify additional important features. Dahl, J. J. M. 2008. Vortex-induced vibration of a circular cylin- 2. High prediction accuracy due to the universal approxi- der with combined in-line and cross-flow motion (Doctoral dis- mation property of the DNN (Hornik, 1993) sertation, Massachusetts Institute of Technology). The WSPINN is a feature selection approach embed- Feeny, B.F., 2008. A complex orthogonal decomposition for ded in DNN, which is able to predict nonlinear input- wave motion analysis. Journal of Sound and Vibration, 310(1-2). output relationships accurately. For instance, for the cross- 77-90. flow VIV amplitude prediction, the prediction accuracy Feng, J. and Simon, N., 2017. Sparse-input neural networks for from the DNN and linear regression given Re, β , ccf* , Ail* , κ cf high-dimensional nonparametric regression and classification. arXiv preprint. arXiv:1711.07592. are 13% and 25%, respectively. However, training DNN Govardhan, R.N. and Williamson, C.H.K., 2006. Defining the with WSPINN requires several rounds of iterations to op- ‘modified Griffin plot’in vortex-induced vibration: revealing the timize the weights in each layer, hence it was found to be effect of Reynolds number using controlled damping. Journal of more computationally expensive than most of the other fluid mechanics. 561, 147-180. machine learning methods We consider the computational Guyon, I. and Elisseeff, A., 2003. An introduction to variable and cost acceptable since our intention is not to create a fast feature selection. Journal of machine learning research, predictive tool but rather to use machine learning to reduce 3(Mar).1157-1182. the dimensionality of the problem as we try to understand Hornik, K., 1993. Some new results on neural network approxi- the importance of each of the many governing parameters. mation. Neural networks, 6(8), 1069-1072. Lian, L., Liu, A. and Lau, V.K., 2018. Weighted LASSO for sparse recovery with statistical prior support information. IEEE Conclusion Transactions on Signal Processing, 66(6). 1607-1618. Lie, H. et.al., 2013, August. Comprehensive riser VIV model tests In this paper, we modify and propose changes to a sparse- in uniform and sheared flow. In ASME 2012 31st International input neural network so it can efficiently select additional Conference on Ocean, Offshore and Arctic Engineering. 923-930. features which can complement a subset of features known McMahan, H.B. et.al., 2013, August. Ad click prediction: a view to be important in advance (i.e. prior knowledge). The al- from the trenches. In Proceedings of the 19th ACM SIGKDD gorithm was applied to the experimental results from vor- international conference on Knowledge discovery and data min- ing.1222-1230. ACM. tex-induced vibration of flexible cylinders. The complicat- ed spatiotemporal response measurements of the continu- Rao, Z., 2015. The flow of power in the vortex-induced vibration of flexible cylinders. Ph. D. Dissertation, Department of Mechan- ous system are reduced to an equivalent 2 Degree of Free- ical Engineering, Massachusetts Institute of Technology, Cam- dom system. The proposed algorithm is then used to inves- bridge, MA. tigate the role of Reynolds number, damping parameter, Resvanis, T.L. et.al., 2012, July. Reynolds number effects on the and shear parameter (3 parameters for which we have prior vortex-induced vibration of flexible marine risers. In ASME 2012 knowledge), as well as 14 other parameters that the dimen- 31st International Conference on Ocean, Offshore and Arctic sional analysis indicated might be important. The algo- Engineering.751-760. rithm was able to reduce the 14 additional parameters to Rosasco, 2010. Statistical Learning Theory and Applications. just 2 additional parameters on top of the prior knowledge. Lecture notes. Massachusetts Institute of Technology. We found that this feature selection technique is much Rudy, S.H., Brunton, S.L., Proctor, J.L. and Kutz, J.N., 2017. Data-driven discovery of partial differential equations. Science more efficient than a brute force combinatorial search. Advances, 3(4), p.e1602614. Sarpkaya, T. 2004. A critical review of the intrinsic nature of vortex-induced vibrations. Journal of fluids and structures, 19(4): Acknowledgement 389-447. This research has been sponsored by the members of the Scardapane, S., Comminiello, D., Hussain, A. and Uncini, A., SHEAR7 Joint Industry Project: BP, Chevron, ExxonMo- 2017. Group sparse regularization for deep neural networks. Neu- bil, Petrobras, SBM Offshore, Shell International Explora- rocomputing, 241: 81-89. tion and Production, Equinor, & Technip USA. Sonin, A.A., 2001. Dimensional analysis. Technical report, Mas- sachusetts Institute of Technology. Tibshirani, R., 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodo- logical), 58(1), pp.267-288. Vandiver, J.K., 1993. Dimensionless parameters important to the prediction of vortex-induced vibration of long, flexible cylinders in ocean currents. Journal of Fluids and Structures, 7(5), 423-455. Vandiver, J.K., 2012. Damping parameters for flow-induced vi- bration. Journal of fluids and structures, 35.105-119. Vandiver, J.K., Ma, L. and Rao, Z., 2018. Revealing the effects of damping on the flow-induced vibration of flexible cylinders. Journal of Sound and Vibration, 433: 29-54. Ye, T., Wang, X., Davidson, J. and Gupta, A., 2018. Interpretable intuitive physics model. In Proceedings of the European Confer- ence on Computer Vision (ECCV). 87-102.