Boosting, bagging and bragging applied to nonparametric regression : an empirical approach
Abstract
The purpose of this study is to determine the effect of improvement methods such as boosting, bagging, bragging (a variation of bagging), as well as combinations of these methods, on nonparametric kernel regression. The improvement methods are applied to the Nadaraya-Watson (N-W) kernel regression estimator, where the bandwidth is tuned by minimizing the cross-validation function. It is known that the N-W estimator is associated with variance related drawbacks. Marzio and Taylor (2008), Hall and Robinson (2009) and Swanepoel (1988, 1990) introduced boosting, bagging and bragging methods to the field of kernel regression. In the current study combinations of boosting, bagging and bragging methods are explored to determine the effect of the methods on the variability of the N-W regression estimator. A variety of methods are utilized to determine the bandwidth, by minimizing the cross-validation function. The different resulting regression estimates are evaluated by minimizing the global MISE discrepancy
measure. Boosting is a general method for improving the accuracy of any given learning algorithm and has its roots in machine learning. However, due to various authors' contributions to the development of the methodology and theory of boosting, its applications expanded to a wide range of fields. For example, boosting has been shown in the literature to improve the Nadaraya-Watson learning algorithm. Bagging, an acronym for bootstrap aggregating, is a method involving the generation of multiple versions of a predictor. These replicates are used to get an aggregated estimator. In the regression setting, the aggregation calculates an average over multiple versions which are obtained by applying the bootstrap principle, i.e. by drawing bootstrap samples from the original training set and using these bootstrap samples as new training sets (Swanepoel 1988, 1990, Breiman 1996a). We also apply some modifications of the method such as bragging where, instead of the average, a robust estimator is calculated by using the bootstrap samples. Boosting, bagging and bragging methods can be seen as ensemble methods. Ensemble methods train multiple component learners and then combine their predictions. The generalization ability of an ensemble is often significantly better than that of a single learner. Results and conclusions verifying existing literature are provided, as well as new results for
the new methods.