Classification and regression trees (CART) Classification and regression trees use binary recursive partitioning methods to partition input data into distinct subsets.
All possible splits of continuous variables (above vs. below a given threshold) and of categorical variables are considered. Using each possible split, all the possible ways of partitioning the sample into two distinct subsets are considered. The binary partition that results in the greatest reduction in impurity is selected.
This process is then repeated iteratively until a predefined stopping rule is satisfied 8. CART enables better interpretability of decision rules. Advantage of such tree-based methods is that it doesn’t assume linearity or parametric form of the relationship with outcome variable and predictors 7.Classification and regression trees were built using the the “rpart” package of the R statistical programming environment. In our study, the default criteria in the rpart package was used in the rpart package, the complexity parameter cp=0.01, and a 10-fold cross validation was used.Random Forest Random forest is a large collection of uncorrelated trees that are built and then averaged, it can be used for both regression and classification.
It is based on a group of n trees grown randomly, which output either a class for classification problems or a continuous variable for regression problems. In our experiment we have set the tree number to n=200, both n=100 and n=300 yielded a declining performance. The idea is to reduce variance as the bias of averaged bagged trees is the same as that of an individual tree 7. They’re similar to CART in that they capture high-order interactions between variables and handle mixed predictors. Random forests are better than CART in that they’re less prone to overfitting and because of using out- of-bag samples, cross validation is already built-in.
Finally, boosting fits additional predictors to residuals from initial predictions 9