An experimental analysis of the effect of sample size on the efficiency of Count Data Models
Abstract
Many multivariate analysts are of the view that bigger sample sizes yield very efficient models. However, this claim has not been verified for count data models. This study embarked on an experimental analysis of the effect of sample size on the efficiency of the Poisson regression model (PRM), Negative binomial regression model (NBRM), Zero-inflated Poisson (ZIP), Zero-inflated negative binomial (ZINB), Poisson Hurdle model (PHM) and Negative binomial hurdle model (NBH(M). The study comprised two parts (Part A and Part B). The data used in Part A were sourced from Data First and were collected by Statistics South Africa through the Marriages and Divorces database.
In Part A, the six models were applied to ten random samples selected from the Marriages and Divorces dataset. The sample sizes ranged from 4392 to 43916 and differed by 10%. Part B applied the six models to five simulated datasets with sizes ranging from 50 000 to 1000 000. The models were compared using the Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), Vuong’s test, McFadden RSQ, Mean Square Error (MSE) and Mean Absolute Deviation (MAD). The results from Part A revealed that generally, the Negative Binomial-based models outperformed Poisson-based models. However, the results from Part A did not show the effect of sample size variations on the efficiency of the models because there was no consistency in the change in the values of model comparison criteria as the sample size increased. The results from Part B were inconclusive, hence were not meaningful.