Estimation of Missing Data in Ecological Studies Using Machine Learning Techniques
Loading...
Files
Date
Authors
Supervisors
Journal Title
Journal ISSN
Volume Title
Publisher
North-West University (South Africa)
Record Identifier
Abstract
The focus of this study is on the estimation of zero-inflated overdispersed count data
that is rampant in ecological studies. This is a major challenge in population studies.
The novelty of this study is to demonstrate that ignoring overdispersion leads to
incorrect parameter estimates, which may result in researchers identifying predictors
as having biological importance when in fact they do not. Hence, the assessment of
overdispersion is very important for count data. This study uses machine learning
(ML) techniques to try and reduce overdispersion in ecological studies. This thesis
consists of three articles presented in separate chapters addressing the main objectives
of the study.
Chapter 2 is a review which aims to explore machine learning techniques to predict
multiple species. The purpose of this review is to get insight of how machine learning
techniques can be used to predict the relationship between species abundance and
habitat suitability. Much work in the literature mostly focused on species distribution
models (SDM) for predicting the relationship of single species and their environmental
relevant covariates with very few studies on the prediction of multiple species.
To illustrate the weaknesses and limitations of SDM, the predictive performance of
the proposed machine learning techniques are conducted on multiple species dataset.
The findings of this study demonstrated that ML techniques provide accurate predictions
for predicting the relationship between multiple species abundance and habitat
suitability.
Chapter 3 presents count regression and machine learning techniques for zero-inflated
overdispersed count data application to ecological data. The problem of overdispersion
that is rampant in ecological count data is investigated and addressed using
machine learning (ML) regression techniques. In this chapter, the performance of
statistical count regression models is compared with the proposed machine learning
regression techniques. The experiment is conducted on a single fish species, Lates
v
niloticus. The mean absolute error (MAE) was used to compare the performance
of count regression models and ML regression models. The results suggest that ML
regression models outperform the statistical count regression models. The findings
suggest that ML regression techniques can be used to model ecological count dataset.
Chapter 4 presents an approach to multi-class imbalanced problem in ecology using
machine learning. In this chapter, we use machine learning approaches to classify
multiple species in population ecology. The goal of this chapter is to ascertain the
performance of machine learning classifiers on multiple species. When classifying
multiple species, we also note a rise of multi-class imbalanced classification problem.
To overcome the multi-class imbalanced problem, the bagging and boosting classifiers
in combination with resampling techniques are used and their performances are
compared. The recall and F1-score performance metrics were used to select the best
classifier for the dataset. The bagging classifiers outperform the boosting classifiers.
Findings of this work suggest that the bagging classifiers can be used to classify
multiple species in ecological studies.
Sustainable Development Goals
Description
PhD (Operational Research), North-West University, Vanderbijlpark) Campus
