Investigation into linear regression influence measure diagnostics and bootstrap-based inferential techniques
MetadataShow full item record
Since extreme or influential observations drastically affect the fit of regression models, their detection plays a big role in regression model fi tting. Many traditional diagnostic techniques employ single-case deletion methods for this purpose, but these have several drawbacks, such as an inability to detect masked or swamped influential cases. Recently, however, new simulation-based techniques have been developed to overcome these problems, such as the technique called ADAP proposed by Roberts et al. (2015). However, this method lacks a formal or data-driven choice for the cut-off values used in the procedure. Another recent technique, attributed to Martin and Roberts (2010), attempts to improve on the traditional single-case deletion methods by using a Jackknife-after-bootstrap approach to fi nd data-driven cut-off values for traditional diagnostic statistics. In this dissertation, we combine these two approaches, and use the Jackknife-after-bootstrap method to fi nd data-driven cut-off values for the ADAP method, thereby potentially improving the latter method. Additionally, a completely new method is proposed for the detection of infuential observations that is based on a simple approximation to the traditional Cook's distance diagnostic measure. In this way, a new cut-off value for Cook's distance is obtained, which can be used as an alternative to the traditional rules of thumb cut-off values. An intensive simulation study is presented that compares the newly proposed `combined' approach, as well as the new method based on the approximation to Cook's distance, to the traditional Cook's distance diagnostic measure, the plain implementation of the Jackknife-after-bootstrap method, the plain implementation of the ADAP method, and a commonly used modern method called MC3. Disappointingly, our results show that while the newly proposed combined method does fare better than the traditional methods, the MC3 method, and the plain implementation of the Jack knife after-bootstrap method, it is an extremely time consuming process and it does not improve on either the plain implementation of the ADAP method, or the newly proposed Cook's distance based method. The new Cook's distance based method, on the other hand, performs better than expected. This method results in very good performances, outperforming the traditional Cook's distance method of obtaining cut-off values, the MC3 method, the plain implementation of the Jackknife-after-bootstrap method, the newly proposed `combined' method, and in some cases it performs as well as the plain implementation of the ADAP method, although the naive application of the ADAP method is still overall the best of the discussed methods.