NWU Institutional Repository

Estimation of Missing Data in Ecological Studies Using Machine Learning Techniques

dc.contributor.advisorSonono, M.E.
dc.contributor.advisorTakaidza, I.
dc.contributor.authorSidumo, Bonelwa
dc.contributor.researchID23756144 - Sonono, Masimba Energy (Supervisor)
dc.contributor.researchID27605329 - Takaidza, Isaac (Supervisor)
dc.date.accessioned2023-07-31T06:03:33Z
dc.date.available2023-07-31T06:03:33Z
dc.date.issued2023
dc.descriptionPhD (Operational Research), North-West University, Vanderbijlpark) Campusen_US
dc.description.abstractThe focus of this study is on the estimation of zero-inflated overdispersed count data that is rampant in ecological studies. This is a major challenge in population studies. The novelty of this study is to demonstrate that ignoring overdispersion leads to incorrect parameter estimates, which may result in researchers identifying predictors as having biological importance when in fact they do not. Hence, the assessment of overdispersion is very important for count data. This study uses machine learning (ML) techniques to try and reduce overdispersion in ecological studies. This thesis consists of three articles presented in separate chapters addressing the main objectives of the study. Chapter 2 is a review which aims to explore machine learning techniques to predict multiple species. The purpose of this review is to get insight of how machine learning techniques can be used to predict the relationship between species abundance and habitat suitability. Much work in the literature mostly focused on species distribution models (SDM) for predicting the relationship of single species and their environmental relevant covariates with very few studies on the prediction of multiple species. To illustrate the weaknesses and limitations of SDM, the predictive performance of the proposed machine learning techniques are conducted on multiple species dataset. The findings of this study demonstrated that ML techniques provide accurate predictions for predicting the relationship between multiple species abundance and habitat suitability. Chapter 3 presents count regression and machine learning techniques for zero-inflated overdispersed count data application to ecological data. The problem of overdispersion that is rampant in ecological count data is investigated and addressed using machine learning (ML) regression techniques. In this chapter, the performance of statistical count regression models is compared with the proposed machine learning regression techniques. The experiment is conducted on a single fish species, Lates v niloticus. The mean absolute error (MAE) was used to compare the performance of count regression models and ML regression models. The results suggest that ML regression models outperform the statistical count regression models. The findings suggest that ML regression techniques can be used to model ecological count dataset. Chapter 4 presents an approach to multi-class imbalanced problem in ecology using machine learning. In this chapter, we use machine learning approaches to classify multiple species in population ecology. The goal of this chapter is to ascertain the performance of machine learning classifiers on multiple species. When classifying multiple species, we also note a rise of multi-class imbalanced classification problem. To overcome the multi-class imbalanced problem, the bagging and boosting classifiers in combination with resampling techniques are used and their performances are compared. The recall and F1-score performance metrics were used to select the best classifier for the dataset. The bagging classifiers outperform the boosting classifiers. Findings of this work suggest that the bagging classifiers can be used to classify multiple species in ecological studies.en_US
dc.description.thesistypeDoctoralen_US
dc.identifier.urihttps://orcid.org/0000-0002-4267-9651
dc.identifier.urihttp://hdl.handle.net/10394/41884
dc.language.isoenen_US
dc.publisherNorth-West University (South Africa)en_US
dc.subjectCount dataen_US
dc.subjectEcologyen_US
dc.subjectClassifiersen_US
dc.subjectImbalanced dataseten_US
dc.subjectMachine learningen_US
dc.subjectMulti-class classificationen_US
dc.subjectMultiple speciesen_US
dc.subjectOverdispersionen_US
dc.subjectSingle speciesen_US
dc.subjectSpecies abundanceen_US
dc.subjectSpecies distribution modelsen_US
dc.subjectZero-inflationen_US
dc.titleEstimation of Missing Data in Ecological Studies Using Machine Learning Techniquesen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Sidumo_B.pdf
Size:
13.96 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.61 KB
Format:
Item-specific license agreed upon to submission
Description: