NWU Institutional Repository

Estimation of Missing Data in Ecological Studies Using Machine Learning Techniques

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

North-West University (South Africa)

Record Identifier

Abstract

The focus of this study is on the estimation of zero-inflated overdispersed count data that is rampant in ecological studies. This is a major challenge in population studies. The novelty of this study is to demonstrate that ignoring overdispersion leads to incorrect parameter estimates, which may result in researchers identifying predictors as having biological importance when in fact they do not. Hence, the assessment of overdispersion is very important for count data. This study uses machine learning (ML) techniques to try and reduce overdispersion in ecological studies. This thesis consists of three articles presented in separate chapters addressing the main objectives of the study. Chapter 2 is a review which aims to explore machine learning techniques to predict multiple species. The purpose of this review is to get insight of how machine learning techniques can be used to predict the relationship between species abundance and habitat suitability. Much work in the literature mostly focused on species distribution models (SDM) for predicting the relationship of single species and their environmental relevant covariates with very few studies on the prediction of multiple species. To illustrate the weaknesses and limitations of SDM, the predictive performance of the proposed machine learning techniques are conducted on multiple species dataset. The findings of this study demonstrated that ML techniques provide accurate predictions for predicting the relationship between multiple species abundance and habitat suitability. Chapter 3 presents count regression and machine learning techniques for zero-inflated overdispersed count data application to ecological data. The problem of overdispersion that is rampant in ecological count data is investigated and addressed using machine learning (ML) regression techniques. In this chapter, the performance of statistical count regression models is compared with the proposed machine learning regression techniques. The experiment is conducted on a single fish species, Lates v niloticus. The mean absolute error (MAE) was used to compare the performance of count regression models and ML regression models. The results suggest that ML regression models outperform the statistical count regression models. The findings suggest that ML regression techniques can be used to model ecological count dataset. Chapter 4 presents an approach to multi-class imbalanced problem in ecology using machine learning. In this chapter, we use machine learning approaches to classify multiple species in population ecology. The goal of this chapter is to ascertain the performance of machine learning classifiers on multiple species. When classifying multiple species, we also note a rise of multi-class imbalanced classification problem. To overcome the multi-class imbalanced problem, the bagging and boosting classifiers in combination with resampling techniques are used and their performances are compared. The recall and F1-score performance metrics were used to select the best classifier for the dataset. The bagging classifiers outperform the boosting classifiers. Findings of this work suggest that the bagging classifiers can be used to classify multiple species in ecological studies.

Sustainable Development Goals

Description

PhD (Operational Research), North-West University, Vanderbijlpark) Campus

Citation

Endorsement

Review

Supplemented By

Referenced By