We are in the process of upgrading DSpace and are restricting logins.
A new method for transforming data to normality with application to density estimation
MetadataShow full item record
One of the main objectives of this dissertation is to derive efficient non-parametric estimators for an unknown density f . It is well known that the ordinary kernel density estimator has, despite of several good properties, some drawbacks. For example, it suffers from boundary bias and it also exhibits spurious bumps in the tails. Various solutions to overcome these defects are presented in this study, which include the application of a transformation kernel density estimator. The latter estimator (if implemented correctly) is pursued as a simultaneous solution for both boundary bias and spurious bumps in the tails. The estimator also has, among others, the ability to detect and estimate density modes more effectively. To apply the transformation kernel density estimator an effective transformation of the data is required. To achieve this objective, an extensive discussion of parametric transformations introduced and studied in the literature is presented firstly, emphasizing the practical feasibility of these transformations. Secondly, known methods of estimating the parameters associated with these transformations are discussed (e.g. profile maximum likelihood), and two new estimation techniques, referred to as the minimum residual and minimum distance methods, are introduced. Furthermore, new procedures are developed to select a parametric transformation that is suitable for application to a given set of data. Finally, utilizing the above techniques, the desired optimal transformation to any target distribution (e.g. the normal distribution) is introduced, which has the property that it can also be iterated. A polynomial approximation of the optimal transformation function is presented. It is shown that the performance of this transformation exceeds that of any transformation available in the literature. In the context of transformation kernel density estimation, we present a comprehensive literature study of current methods available and then introduce the new semi-parametric transformation estimation procedure based on the optimal transformation of data to normality. However, application of the optimal transformation in this context requires special attention. In order to create a density estimator that addresses both boundary bias and spurious bumps in the tails simultaneously in an automatic way, a generalized bandwidth adaptation procedure is developed, which is applied in conjunction with a newly developed constant shift procedure. Furthermore, the optimal transformation function is based on a kernel distribution function estimator. A new data-based smoothing parameter (bandwidth selector) is invented, and it is shown that this selector has better performance than a well established bandwidth selector proposed in the literature. To evaluate the performance of the newly proposed semi-parametric transformation estimation procedure, a simulation study is presented based on densities that consist of a wide range of forms. Some of the main results derived in the Monte Carlo simulation study include that: * the proposed optimal transformation function can take on all the possible shapes of a parametric transformation as well as any combination of these shapes, which result in high p-values when testing normality of the transformed data. * the new minimum residual and minimum distance techniques contribute to better transformations to normality, when a parametric transformation is applicable. * the newly proposed semi-parametric transformation kernel density estimator perform well for unimodal, low and high kurtosis densities. Moreover, it estimates densities with much curvature (e.g. modes and valleys) more effectively than existing procedures in the literature. * the new transformation density estimator does not exhibit spurious bumps in the tail regions. * boundary bias is addressed automatically. In conclusion, practical examples based on real-life data are presented.