Predicting Service Life of Polyethylene Pipes under Crack Expansion using "Random Forest" Method

The study of factors influencing the performance of PE pipe against rapid crack expansion is of great significance for the safe use of PE pipe. This paper analyzes the role of each step in the algorithm based on the theoretical basis of random forest, and proposes an improved random forest method based on recursive feature elimination by changing the node splitting rules to address the shortcomings of the random forest classification accuracy. The method is used to analyze the effect of rapid crack expansion of PE pipe in terms of pipe size and wall thickness, impact knife speed, and notched impact strength of simply supported beams. Under the same conditions, the extended crack lengths of DN260, DN150 and DN65 pipes are 197, 164 and 128 mm, respectively, while the crack lengths of PE80 pipes are 24, 210 and 239 mm at impact knife speeds of 10, 15 and 20 m/s, respectively. The higher the notched impact strength of the simple beam, the higher the critical pressure value and the better the RCP resistance. The study of rapid crack expansion of PE pipe based on deep learning algorithm can identify the main internal and external factors affecting the RCP resistance of PE pipe and provide a solid basis for PE


INTRODUCTION 1
Polyethylene pipes have many advantages over traditional metal pipes and have become the best choice for urban pipeline networks [1,2].Polyethylene pipes have excellent toughness, with a minimum elongation at break of 350% required for tensile testing.Therefore, it can usually undergo a large deformation and is very adaptable to foundation settlement as well as pipeline deflections [3][4][5].In impact tests, brittle fracture occurs only when the specimen is sharply notched within the service temperature range [6][7][8].Polyethylene is an inert material, and at 20°C, polyethylene is resistant to strong acid and alkali corrosion candles [9], solving the problem of the need to strictly consider corrosion protection when laying traditional pipelines [10].Polyethylene pipes are easy to install and have good welding properties, which is due to the solubility of polyethylene pipes [11][12][13].
With the improvement of raw material performance, the resistance of PE pipes to crack sprouting and to rapid crack expansion has dramatically increased.Most of the methods for rapid crack evaluation of PE pipes [14][15][16] *Corresponding Author Email: 75829472@qq.com(T.Yifan) generally suffer from complex experimental conditions and poor reproducibility, which seriously restrict the development process of PE pipes [17][18][19].There are still many blank areas for research exploration in various aspects of PE pipe manufacturing, welding and laying, testing and maintenance.The study of material resistance to rapid crack growth of PE pipes can provide scientific guidance for the selection of materials, welding, inspection and evaluation, life prediction and other key issues of PE pipes and promote the safe, standardized and stable development of PE pipes by exploring in depth the performance of PE pipes.
With an increase in the usage of polyethylene pipes, more and more researchers have investigated their rapid crack expansion.Nikolaev and Zaripova [20] Baktizin et al. [21] and Vasiliev et al. [22] tested the resistance to rapid crack expansion of different types of singlepeaked MDPE pipes and indicated that single-peaked MDPE pipes have sufficient RCP resistance to be used in gas distribution systems.A small-scale accelerated and reliable testing method was proposed by Liu and Kleiner [23], Naseri and Barabady [24] Rajeev and Kodikara [25] for the S4 test of conventional plastic pipes against rapid crack expansion using a large amount of material and time consuming, and was experimentally verified for double-peaked MDPE pipes.Thaduri et al. [26], Transport [27], Sepideh et al. [28] proposed a new method for evaluating the resistance of PE pipes to rapid crack expansion at low temperatures to optimize the problem that the traditional evaluation method is not comprehensive enough.Also, Enrico [29] Kim et al. [30] Mohamed and Jawhar [31] proposed a method for the evaluation of rapid crack expansion in PE pipes by analyzing the main causes of rapid crack expansion in PE pipes.On the other hand, Narayanan and Sankaranarayanan [32] Gafarova [33] investigated the effect of fatigue crack extension in HDPE pipes from the point of view of molecular weight distribution and proposed a possible link between the potential failure mechanism of cracked primary fibers and feed-back kinetics.Mohamed and Jawhar [34] Shammazov et al. [35] Jin and Eydgahi [36] analyzed the rapid crack expansion of polyethylene pressure pipes using simulation methods and proposed a test experiment method for testing the resistance of polyethylene pipes to rapid crack expansion.
In this paper, we investigate the fast crack extension life prediction of PE pipe based on deep learning.Firstly, the theoretical basis of random forest is studied, decision trees are constructed based on Bagging idea, and results are derived by systematic voting using a classification model composed of multiple decision trees.Secondly, the method of using recursive feature elimination is proposed to improve for the defects of random forest, and the accuracy of random forest is im-proved by changing the node fracture rule.Then, fast crack expansion experiments are designed for different PE pipe sizes in different environments, where the special material is divided into PE80 and PE100, and the pipe sizes are divided into DN160, DN63 and DN315, and the SDR is fixed at 11. Finally, the experimental data were analyzed based on random forest with recursive feature elimination to study the effects of pipe size and wall thickness, impact knife speed and simply supported beam.The effects of notch impact strength on the RCP resistance of PE pipe were investigated.

1. Improved Random Forest Algorithm
The random forest algorithm selects CART trees as the base classifier, uses the Bagging algorithm to randomly select a subset from the original data set samples as training samples, and then randomly selects a subset of features from multiple features in the training samples, on which a decision tree is generated, and the classification results of the random forest were obtained by decision tree voting.In the construction of the random forest, the sample selection and attribute selection were obtained by random sampling, so it had better generalization ability.
Random Forest (RF) is an integrated learning algorithm, which is essentially a combinatorial classifier composed by a large number of decision trees [37,38].The actual class of the random forest is obtained by voting from Vasiliev et al. [22] Liu et al. [39] Palaev et al. [40] a large number of decision trees.The random forest algorithm combines the random subspace idea and the best partitioning idea, without many restrictions in terms of hyperparameters, with a simple and easy to understand structure not easy to overfit, and can handle missing and unbalanced data sets very well [41][42][43][44].Its model training and prediction are efficient and stable, so it is widely used in clustering and regression classification.The principle of random forest algorithm is shown in Figure 1.
Random forest algorithm is a common integrated algorithm, which is combined by a large number of decision trees.The decision tree itself has poor classification accuracy and is a typical representative of weak classifiers [45][46][47].The integration of weak classifiers together can significantly improve the accuracy of the overall classifier.Each decision tree of random forest is computed separately from different selfhelp samples, and multiple decision trees are generated and clustered together to form a forest.The classification error of a decision tree is determined by the classification effect of different trees and the degree of correlation between trees [48][49][50].The random forest is improved by splitting the nodes based on the decision tree, which usually selects the best feature attribute among all the feature attributes as the base for splitting when the nodes are split [51,52], but the random forest generally selects some feature attributes randomly for a higher degree of generalization, and then performs the selection of the best feature attribute on this basis [53,54] samples from the original sample set T to build a training subset.Each of these subsets builds a decision tree.Assume that to form a random forest with decision trees, self-help sample sets are first obtained by the Bagging algorithm.The data that are not drawn are recorded as out-of-bag data (OOB), and the generalization ability of the random forest is measured by the error.The error can reflect the classification accuracy and can also judge the feature importance (VIM).The OOB error is calculated for each decision tree, and the error is calculated again after randomly disrupting the out-of-bag data for variable i x , and the average of the two differences is the VIM value of the current variable.
Variable Xi in the j tree of VIM is: Variable i x in the random forest VIM is: where N represents the amount of observed data for the J tree OOB, OOB the m th observation,    and the estimated results of m observation of the J tree OOB before and after random swapping, and I the two values are equal to 1 and unequal to 0.
(2) Constructing decision trees.n self-help sample sets generate n classification trees each.The sample feature vector is M. The traditional decision tree selects the best features from M feature vectors, while the random forest first randomly selects m (m• M) features from M. Each decision tree is split by selecting the m optimal features from the feature vectors, and the classification trees are fully grown without pruning.
(3) Voting for the final classification result.The random forest algorithm helps to improve the diversity of decision trees by constructing different training subsets, which in turn improves the accuracy of the random forest as a whole n decision tree models will eventually produce n classification results: In Equation ( 4 ) XY is the ability of the classifier to accurately categorize the next classified sample ( , ) XY by analyzing the difference between the average number of votes that were correctly classified and the maximum number of votes for classification deviation.A higher value of arg min( , ) XY indicates a higher confidence level of the classifier and more reliable classification of the classification model.
We expect the edge function of the classification model set H to be high, which means that the number of correctly classified base classifiers is higher than the number of incorrectly classified base classifiers, i.e., arg min( , ) 0 XY  .However, there are situations where incorrect classification results are obtained.Such misclassified results are usually presented by the generalization error.The generalization error for the set of classification models is calculated as: * , (arg min( , ) 0) where arg min( , ) 0 XY  denotes that the test sample is misclassified in the whole combined classifier and represents the probability that the sample is misclassified in the combined classifier.Therefore a low value of generalization error means that the model classifies better.
According to the large number theorem and the structure of the decision tree itself, it is proved that the generalization error converges to a certain value when the size of the decision tree in the random forest is larger, satisfying Equation (7): where m is the random forest size,  is the random vector of individual classification models, and ( , ) hX is the output of the classification models based on attribute features X as well as  .
As the random forest grows in size, it will gradually converge to a certain value without overfitting due to excessive increase in the decision tree.
Definition of RF edge function: The ˆ( , ) arg max ( ( , ) ) From the above equation we can derive: The variance of the edge function is derived from the above equation: , var( ) cov ( ( , , ) ( , , )) ( , )    is the correlation between the two.represents the standard deviation.Therefore var( ) mr simplifies to Equation: where  is the mean value of base classifier correlation.
S is the average intensity of the base classifier.The upper bound of generalization error * PE can be obtained as: * PE A larger upper bound value indicates that more samples are misclassified and the overall classification of the combined classifier is not good.It can be seen that the classification accuracy of the combined classifier is related to the correlation between each classifier and the classification ability of the individual classifier itself.Therefore, the classification accuracy of random forest can be improved by reducing the correlation of decision trees and improving the classification accuracy of each decision tree.
The classification accuracy of random forest is the most reliable way to verify its performance.The classification accuracy characterizes how well the actual labeled categories match the algorithm's classification categories.Random forest is a high-precision algorithm among classification algorithms, and although its performance varies in different datasets, it basically maintains in the range of 70% to 90%.

Improved Random Forest based on Recursive Feature Elimination
In this section, we perform the combination of RFE and random forest.First, the combination of random forest and RFE forms RF-RFE, which is able to decide the size of the final feature subset more rationally and avoid the influence caused by human factors.Recursive feature elimination (RFE) is a strategy to deal with the problem by combining machine learning methods with it in the process of each iteration to construct a model using the current set of features and evaluate the importance of the current features with the performance of the model.
RF-RFE algorithm is used for feature selection, first use random forest algorithm to get the importance ranking of features, according to the principle of backward iteration first delete the features with the smallest feature importance, then the remaining features again use random forest algorithm to get the importance ranking of new features, in turn delete the features with small feature importance, RF-RFE feature selection method in the process of each iteration, will re-evaluate the current set of remaining features, and the score of each feature is adjusted during repeated iterations, overcoming the drawback that the feature selection result of single random forest needs repeated trials to get the feature subset, making the feature subset not only reliable, but also of better quality.
When applying the RF-RFE algorithm for feature selection, the first is the process of random forest, using the bootstrap resampling method to draw multiple samples from the original sample, constructing a decision tree for each bootstrap sample, all the decision trees constitute a random forest, calculating the feature importance in the regression model, at this time, the backward iterative feature evaluation is introduced, and the features with small feature importance are removed.After using the random forest algorithm again to calculate the remaining feature importance until finally only one feature is left, the most feature set is selected according to the correlation coefficient and root mean square error, and the flow chart of RF-RFE algorithm is shown in Figure 2.
The process of RF-RFE algorithm for feature selection is: Step 1: Assuming that the original number of data samples is n , bootstrap sampling is applied to randomly select b subsets of samples with release, and b regression trees are constructed based on these subsets of samples, and the samples that are not drawn during each bootstrap sampling form b out-of-bag data, which form the test sample of the random forest.Step 2: Let the number of variables in the original sample set be, select a randomly selected variable at each node of each regression tree as an alternative variable, and then select the optimal branch in it according to certain criteria, so that each decision tree grows to the maximum.Step 5: After calculating the average decline MSE value, the features with the smallest importance are firstly deleted according to the principle of backward iteration, and then the remaining features are repeated from steps 1-4, and the features with small importance are gradually deleted until the last feature is left, and after the results are output, the number of features with the smallest root mean square error and the largest correlation coefficient is selected as the result of feature selection for remote sensing estimation of forest biomass.The RF-RFE algorithm for feature selection reevaluates the current set of remaining features during each iteration, and the score of each feature is adjusted during repeated iterations, overcoming the drawback that the feature selection results of a single random forest require repeated trials to obtain a subset of features.

1. PE Pipe Rapid Crack Expansion Test
Rapid crack propagation (RCP) of polyethylene (PE) pipes refers to the phenomenon of PE pipes being subjected to external forces (e.g., building construction, irregular welding, etc.) during use and the formation of cracks generated by stress under the pressure of the medium inside the pipe (e.g., tap water, natural gas, etc.), which expand at a rate of several hundred meters per second along the length of the pipe [55,56].
The fluid pressure inside the tube induces stress in the tube wall.The tube wall stores strain energy because it is in a stress-acting state.When rapid crack growth occurs in the light tube wall, the tube wall changes from a stressacting state to a stress-free state.The original strain energy stored in the tube wall is released for the production of new crack area.This means that the released strain energy of the tube wall acts as a crack driving force [44,57].This released energy is transported to the crack tip by the stress wave of the tube wall material.The stress wave velocity is the velocity of the acoustic wave within the tube wall material.
In this paper, we obtained the influence factors affecting the service life of PE pipes through rapid extension cracking experiments, and then analyzed the obtained data based on RF-RFE algorithm to establish the life prediction method.The description of the samples of PE pipe special material is shown in Table 1.In this paper, pipe series with larger outside diameters and thicker walls were selected for testing, while pipe series Dn160 (SDR11), Dn63 (SDR11) and Dn 315 (SDR11) were selected for comparative testing in order to compare the effect of different wall thicknesses and sizes on the rapid crack expansion of PE pipe.The finished pipe samples are shown in Table 2. Experimental steps: Step 1: Process the tubes in a cryogenic cabinet at (0±2)°C for the appropriate time according to the standard requirements for different thicknesses of tubes.Step 2: Fill the pipe sample with fluid (air or water, usually air).

TABLE 1. Samples of special materials for polyethylene pipes
Step 3: Given test temperature and pressure.
Step 4: An impact is made at one end of the pipe to initiate a rapidly propagating longitudinal crack.
Step 5: Use the internal baffle and external locating ring of the test setup to limit edge expansion after cracking and rapid decompression before expansion (uncracked portion) of the sample.
Step 6: Keep the temperature constant and change the pressure to find the critical point (4.7 times the OD length) for stopping and cracking.The higher the critical pressure 4 Pcs , the better the resistance of the material to crack expansion.

2. Determination of Critical Pressure
When the test pressure value was less than 0.8 MPa, the crack length increased very slowly with an increase in the test pressure, and the curve was relatively flat, and the crack length at each pressure point did not exceed 500 mm.After that, with an increase in the pressure, the increase of crack length tends to slow down again and the curve tends to be horizontal.This phenomenon indicates that there is a sudden change in the crack extension of the material as the pressure increases, i.e., there is a critical value of RCP Pc , which is the result of the tough-brittle transformation of the internal structure of the pipe.The critical pressure determination is shown in Figure 3.
The critical pressure values for tubes GS-002 to GS-015 can be obtained in the same way.
The RF-RFE algorithm was used to analyze the experimentally obtained data, and the main factors influencing the RCP of PE pipe were pipe size and wall thickness, impact knife speed, and notched impact strength of the simple beam.The effect of pipe size and wall thickness on rapid crack expansion is shown in Figure 4.Under the same experimental conditions, the extended crack length for PE100-3 pipe is 203 mm for size DN260, 170 mm for size DN150 and 136 mm for size DN65.For PE100-2 pipe, the extended crack lengths for sizes DN260, DN150 and DN65 are 197, 164 and 128 mm in that order.For different pipe specialties, the average crack length is 189 mm for DN260 size, 156 mm for DN150 size and 122 mm for DN65 size.
The results of RCP experiments with different pipe sizes using the same PE pipe material show that the critical pressure values of the different materials differ greatly and the results are not related to the basic physical parameters such as density and melt flow rate of the PE pipe material.The critical pressure values of the same pipe material extruded with different pipe diameters increase as the pipe OD decreases and the wall thickness becomes thinner, and no RCP damage occurs at 0°C when the pipe diameter is reduced to DN63.This proves that the larger the outside diameter of the pipe and the thicker the wall, the greater the risk of RCP damage at low temperatures.

DISCUSSION
The effect of impact knife speed on crack extension is shown in Figure 5.For GS-001 pipe, the crack length was 11 mm at the impact knife speed of 9 m/s and 24, 210 and 239 mm at the impact knife speeds of 10, 15 and 20 m/s.For GS-003 pipe, the crack length was 17 mm at the impact knife speed of 9 m/s and 55, 252 and 256 mm at the impact knife speeds of 10, 15 and 20 m/s, respectively.When the impact knife speed is less than 9 m/s, the pipe cracking does not occur any damage, and only when the impact knife speed is greater than 9 m/s, the effective impact can occur, but the crack cracking length does not increase rapidly with the increase of impact speed, and there is no obvious change of crack cracking length when the impact knife speed is greater than 20 m/s.
A comparison of the notched impact strength and critical pressure values of the simply supported beam is shown in Figure 6.A specific feature of polymers is their ability to deform with respect to time under applied loads.So, for our samples the impact strength for the polyethylene pipeline was 1.021 MPa and 1.56, 1.819, 2.029, 0.225 MPa for pipe diameters of 31.9 mm, 18.2 mm, 17.4 mm, at an ambient temperature of 23°C.The impact toughness for polyethylene pipework was always greater than or equal to 1.8 MPa.The samples with lower values of notched impact strength of the simple beam also had lower value.The samples with the highest notched impact strength of the simple beam also had better RCP performance, which indicates that there is a relationship between the impact strength and RCP performance of the material.At the same time, the notched impact strength and values are not linear, for example, the impact strength of GS-001 is higher than that of GS-002, but the value of  GS-002 is higher than that of GS-001.This indicates that the notched impact strength and the critical pressure values of the special materials for pipes are not in complete correspondence.

CONCLUSION
The experience of operation of gas pipelines made of polyethylene pipes has shown high resistance of the material to natural gas and less resistance to lower resistance to gaseous propane-butane mixture.From exposure to the vapour phase of these gases the material swells, and at prolonged stay in the liquid phase, it loses some of its mass.This is particularly low-density polyethylene, which swells considerably when exposed to these gases.
Like paraffins, polyethylene is inert to the action of many other substances, such as water, acids, alkalis.The active substances that that have some effect on polyethylene polyethylene include aromatic hydrocarbons (benzene, toluene, xylene), alcohols (methyl, ethyl), oils (vegetable, mineral, silicone), animal fats, inorganic oils (metal-containing oils), synthetic detergents.detergents.The impact of active media is manifested to a greater extent on polyethylene structures under stress.К passive substances include water, inorganic acids, inorganic salts, multi-atomic alcohols (glycerin, polyethylene, polyethylene, etc.).alcohols (glycerine, glycol), paraffins, etc.
This paper analyzes the factors influencing the role of rapid crack expansion in PE pipe using a random forest algorithm with recursive feature elimination.the average crack length for the DN260 size is 189 mm and is within 20 m/s, the faster the impact knife speed, the greater the length of the crack.There are many factors affecting the rapid cracking of PE pipes, mainly influenced by the material's own factors and external factors.
(1) Different types of PE resin prepared pipes have different ability to resist rapid crack expansion, such as PE100 prepared pipes are better than PE80 pipes.
(2) The larger the diameter of PE pipe, the more likely it is to cause rapid crack expansion, while the effect of wall thickness on rapid cracking of pipe depends on the situation.
(3) Under the same conditions, the higher the critical pressure value that the pipe can withstand, the better the pipe's resistance to rapid crack expansion.(4) The use of temperature directly affects the flexibility of PE pipe, and the use of PE pipe in low temperature conditions is more likely to cause rapid crack expansion of the pipe.

1 (
is a classification model system consisting of n decision tree model.Combined classification model voting yields overall classification categories: ) arg max ( ( ) ) func is the indicator function, j represents the vector of classification errors, and ( ) m av func represents the mean value.arg min( , ) classification ability of classification model set H is influenced by the individual classification models, and the combined value of the classification results of the individual classification models is the classification performance of the entire classification model set H .The classification ability of classification model set H can be described as the expected value of (

Figure 2 .
Figure 2. The RF-RFE algorithm flow chart

Step 3 :y
The set of 1 regression trees generated in step b becomes a random forest regression model, and the effect of the random forest regression model is evaluated using the mean square OOB MSE of the residuals predicted using out-of-bag data, is the actual value of the dependent variable in the out-of-bag data.OOB i y is the predicted value of the random forest for the out-of-bag data.Step 4: The mean decline MSE value is calculated from the mean square of the residuals predicted by out-of-bag data.The importance of the variables in the random forest regression can be measured by the mean decline MSE value, where a larger value indicates a more important feature.

TABLE 2 .
Tube samples for main test purposes

Figure 5 .
Figure 5.Effect of impact knife velocity on crack propagation

Figure 6 .
Figure 6.Effect of notched impact strength of simple support beam on crack propagation

. A simple voting operation is applied to all the decision tree classification results of the random forest to derive the category results
), A represents the blockage fault category, ()