Crack Extension Life and Critical Crack Length Prediction Based
on XGBoost
Yu Liu 1, Kaixing Zhao 2, Fusheng Hou 1, Xiaohui Hao 1
1
    COMAC Shanghai Aircraft Design & Research Institute, No.5188 Jinke Road, Shanghai, 200120, China
2
    Northwestern Polytechnical University, 127 West Youyi Road, Beilin District, Xi'an Shaanxi, 710072, China

                 Abstract
                 Damage tolerance design can ensure the structural safety of civil aircraft throughout its life
                 cycle, which requires accurate analysis of crack extension life and crack extension length. This
                 paper proposes an XGBoost-based crack extension life and crack extension length prediction
                 method for civil aircraft structures. The method uses machine learning algorithms to train the
                 structural state prediction model. The advantage of this method is that it can quickly determine
                 the crack life and crack length without relying on the processing technology and engineering
                 diagnosis experience of a large amount of collected data, which provides a more flexible
                 method to determine the crack extension life and crack extension length under various
                 influencing factors. By comparing a variety of machine learning algorithms, XGBoost model
                 obtained the highest test scores, the experimental results show that the method achieves
                 accurate prediction of crack extension life and critical crack length, which can be used for rapid
                 engineering evaluation.

                 Keywords 1
                 Damage tolerance, crack extension, machine learning, XGBoost

1. Introduction

    Damage tolerance design is a modern fatigue fracture control method developed and progressively
applied since the 1970s to ensure the structural safety of structures throughout their life cycle. The
theory assumes that each structural material has internal defects in the course of processing or using, so
the rate of expansion of these defects and the remaining strength of the structure needs to be determined
using various damage theories (such as fracture mechanics) and a given external load [1]. Damage
tolerance is also a major component of the assessment for strength certification of aircraft by NASA,
FAA, and other agencies. In the damage tolerance analysis, in addition to determining the critical parts
of the damage tolerance and their sensitive parts, the accuracy of the crack extension life calculation
and crack extension length calculation for this part is an important part to ensure the correct conclusion
of the analysis [2]. However, there are various types of structural details in the airframe structure and
their load states are complex, while the material of each part of the structure are different [3]. Therefore,
in addition to the traditional fracture mechanics-based analysis methods, a tool that can quickly evaluate
the crack extension life and the final crack length of the corresponding structure is also needed.
    XGBoost, which is short for eXtreme Gradient Boosting, is an algorithmic toolkit based on the
Boosting framework and is very powerful in parallel computation efficiency, missing value handling,
and prediction performance [4]. In data science, XGBoost is well suited for performing data mining; in
industrial large-scale data, the distributed version of XGBoost has extensive portability and supports
running on various distributed environments such as Kubernetes and Hadoop, making it a good solution
for industrial large-scale data.

ICBASE2022@3rd International Conference on Big Data & Artificial Intelligence & Software Engineering, October 21-
23, 2022, Guangzhou, China
EMAIL: liuyu1@comac.cc (Yu Liu); kaixing.zhao@nwpu.edu.cn (Kaixing Zhao); houfusheng@comac.cc (Fusheng Hou),
haoxiaohui@comac.cc (Xiaohui Hao)
ORCID: 0000-0001-7565-519X (Yu Liu); 0000-0002-0000-084X (Kaixing Zhao); 0000-0002-3192-3030 (Fusheng Hou); 0000-0001-5233-
7762 (Xiaohui Hao)
              © 2022 Copyright for this paper by its authors.
              Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
              CEUR Workshop Proceedings (CEUR-WS.org)


                                                                                  86
   The current theories based on data-driven deep learning and machine learning, as the latest research
results in pattern recognition, have achieved fruitful results in big data processing in various fields with
powerful modeling and characterization capabilities. Therefore, using XGBoost-based crack extension
prediction can get rid of the reliance on the traditional analysis method of mechanism-based fracture
mechanics, complete the adaptive extraction of structural states and the construction of complex models,
and finally realize end-to-end modeling to complete the prediction of complex indicators [5, 6, 7].

2. Crack extension dataset creation

   As a part of the aircraft, the fuselage skin is an important part of the whole aircraft structure. In this
paper, crack expansion prediction is performed for the fuselage and wing skin structure, which is usually
considered as the central crack of an infinite plate, and the simplified model is shown below.


Figure 1: Central crack simplified model

    Since the specimen-level test matrix of crack expansion usually cannot meet the data volume
demand of machine learning, the data set in this paper is obtained by the crack expansion analysis
software. Among them, if only the pressure-filled load of the skin is considered, the loaded state of the
structure can be simplified to an equal amplitude spectrum. The specific parameters include structure
width (plate width), maximum stress, structure thickness and initial crack size [8, 9].
    The specific parameters and range are as following.
    • structure width (plate width), 100-1500 mm;
    • maximum stress, 100-200 MPa;
    • Structure thickness, 0.5-5.5 mm;
    • initial flaw(crack) size, 1.27-3.77 mm.
    The specific dataset was created as follows.
    1. Carrying out NASGRO input file generation based on a script that can automatically generate
        input files in Python, based on the planned test matrix;
    2. Performing batch calculations;
    3. The script obtains the cycle number and crack length information in each output file and
        summarizes them as the output result data table.
    Over 1200 sets of data were eventually collected, including four input parameters as well as crack
extension life and critical crack length [10].

3. Crack extension cycle and critical crack prediction based on XGBoost

   This section covers the modeling and prediction of crack extension life/critical crack length based
on XGBoost, and also includes the modeling process, data preprocessing, parameter optimization,
model training, model testing results and comparison of other algorithms.


                                                     87
3.1.    Modeling Process

   The overall XGBoost prediction model can be divided into several parts: data reading and
preprocessing, parameter optimization for cross-validation, model training, model testing, and result
output. The flow is shown in Figure 2.


Figure 2: Modeling Process

3.2.    Data preprocessing and parameter optimization

    The original data was saved in csv format, and after removing the samples containing missing values
and samples with too high lifetimes, the final 900 data sets were left. The total data set was partitioned
8:2, in which 80% of the training set (720 samples) was used for training the xgboost model and 20%
of the test set (180 samples) was used for performance testing.
    In modeling and training with xgboost, where the boosting round as an important parameter can be
searched using the cross-validation function that comes with the model. In this paper, we use Random
Search to find the optimal boosting round, with the upper limit of 500 cycles, Early stopping and
automatic termination after 5 times of no change in the optimization index, tolerance of 0.01, cross-
validation of 10fold, and optimization index of MSE. The final best The final best value of boosting
round is 143.

3.3.    Model training and model testing

    The XGBoost model is trained using the training dataset, where the boosting round is 143. the rest
of the model parameters are listed in the following table [11].

Table 1
XGBoost model parameters
      Parameters                    Value                   Parameters                    Value
  colsample_bytree                   1.0                      lambda                       1.0
   scale_pos_weight                  1.0                        eta                        0.3
      base_score                     0.5                    grow_policy                 depthwise
      max_depth                       6                        alpha                       1.0
  colsample_bynode                   1.0                     objective              reg:squared error


                                                   88
   The model performance was tested using the full test set data, and the test set prediction scores are
shown in the figure. After 143boosting rounds of training, the performance metrics of the model are
shown in the following table.

Table 2
XGBoost model test score
        Categories                  RMSE                      MAE                    R square
          Cycles                   119.528                   70.703                   0.9999
        Crack size                  0.526                     0.269                   0.9997


Figure 3&4: Actual value vs predicted value of cycles(left) and crack size(right)

   The predicted model weights are analyzed. As shown in figure 5, the weights vary widely among
the different attributes, which is of greater significance for analyzing the key influencing factors and
optimizing the model inputs, and the relatively high-weighted attribute such as width may provide more
information about the overall structural state of the aircraft.


Figure 5: Attributes weights

3.4.    Comparison with other algorithms

    This study also applies other machine learning methods for modeling, including generalized linear
regression models, decision tree, and support vector machine, to model the prediction of crack
expansion number and critical crack length as well. The prediction models were all trained using the
same 80% dataset (720 samples) and tested using 20% dataset (180 samples). One of the decision tree
models, max_depth, was kept consistent with the XGBoost parameters above. The corresponding
evaluation indexes are given for the performance of XGBoost and the other three prediction models in
the test set. The prediction result scores for the number of crack expansion cycles are shown in Table
3, and the prediction result scores for the critical crack length are shown in Table 4 [12, 13, 14].


                                                   89
Table 3
Cycles prediction test score
      Categories                   RMSE                       MAE                       R square
        XGBoost                   119.528                    70.703                      0.999
  Generalized linear             15682.212                  12117.993                    0.764
     Decision tree                862.681                    702.061                     0.997
         SVM                     16436.034                  12781.517                    0.754

Table 4
Crack length prediction test score
       Categories                  RMSE                        MAE                      R square
        XGBoost                     0.526                      0.269                     0.997
  Generalized linear               15.612                     12.453                     0.742
     Decision tree                  1.665                      1.214                     0.997
          SVM                      15.772                     11.050                     0.740

4. Conclusion

   The XGBoost model has the best test score regardless of the number of crack expansion cycles or
the length of critical cracks, and the model accuracy can meet engineering applications. Among the
other tested models, the decision tree model has the closest score to Xgboost, which is also consistent
with the performance consistency of the same tree model on the same data set.
   In this paper, the Xgboost-based model achieves the prediction of the extended life and critical crack
length of the center penetration crack of a flat plate. The model can be applied to the rapid assessment
of damage tolerance in engineering. However, the model data is obtained based on fracture mechanics
software simulations, which will have some deviation when applied to real aircraft structures, and can
be combined with crack extension test data for further migration learning.

5. References

[1] Reifsnider K L, Case S W. Damage tolerance and durability of material systems[M]. 2002.
[2] Jones R. Fatigue crack growth and damage tolerance[J]. Fatigue & Fracture of Engineering
     Materials & Structures, 2014, 37(5): 463-483.
[3] Toor P M. A review of some damage tolerance design approaches for aircraft structures[J].
     Engineering fracture mechanics, 1973, 5(4): 837-880.
[4] Chen T, He T, Benesty M, et al. Xgboost: extreme gradient boosting[J]. R package version 0.4-2,
     2015, 1(4): 1-4.
[5] Konda N, Verma R, Jayaganthan R. Machine Learning Based Predictions of Fatigue Crack Growth
     Rate of Additively Manufactured Ti6Al4V[J]. Metals, 2021, 12(1): 50.
[6] Pierson K, Rahman A, Spear A D. Predicting microstructure-sensitive fatigue-crack path in 3D
     using a machine learning framework[J]. Jom, 2019, 71(8): 2680-2694.
[7] Raja A, Chukka S T, Jayaganthan R. Prediction of fatigue crack growth behaviour in ultrafine
     grained al 2014 alloy using machine learning[J]. Metals, 2020, 10(10): 1349.
[8] Wang B, Zhao W, Du Y, et al. Prediction of fatigue stress concentration factor using extreme
     learning machine[J]. Computational Materials Science, 2016, 125: 136-145.
[9] Mettu S R, Shivakumar V, Beek J M, et al. NASGRO 3.0: A software for analyzing aging
     aircraft[C]//The Second Joint NASA/FAA/DoD Conference on Aging Aircraft. 1999 (Pt. 2).
[10] NASGRO F M. Fatigue Crack Growth Analysis Software[J]. Reference manual, 2002.
[11] Brownlee J. XGBoost With python: Gradient boosted trees with XGBoost and scikit-learn[M].
     Machine Learning Mastery, 2016.
[12] Noble W S. What is a support vector machine?[J]. Nature biotechnology, 2006, 24(12): 1565-1567.


                                                   90
[13] Myles A J, Feudale R N, Liu Y, et al. An introduction to decision tree modeling[J]. Journal of
     Chemometrics: A Journal of the Chemometrics Society, 2004, 18(6): 275-285.
[14] Nelder J A, Wedderburn R W M. Generalized linear models[J]. Journal of the Royal Statistical
     Society: Series A (General), 1972, 135(3): 370-384.


                                                91