Show simple item record

dc.contributor.advisorVan Vuuren, P.A.en_US
dc.contributor.authorSnyman, W.en_US
dc.date.accessioned2020-03-05T12:37:26Z
dc.date.available2020-03-05T12:37:26Z
dc.date.issued2019en_US
dc.identifier.urihttps://orcid.org/0000-0002-8649-1052en_US
dc.identifier.urihttp://hdl.handle.net/10394/34266
dc.descriptionMEng (Computer and Electronic Engineering), North-West University, Potchefstroom Campus
dc.description.abstractFeature selection is crucial for increasing the performance of predictive models in both classification accuracy and model training time. For high-dimensional data, feature selection becomes ever more necessary to select adequate features which complement the predictive model of choice. Filter, wrapper and embedded feature selection techniques are among the most popular algorithms to solve the feature selection conundrum of irrelevant and redundant features reducing the performance of classification models. It is also common to find hybrid techniques which combines filter, wrapper and embedded techniques to construct more robust feature selection algorithms. This study is dedicated to reveal the ongoing improvement in the field of feature selection and to dissect six different feature selection algorithms for a detailed insight into their success for high dimensional data, specifically gene expression microarrays. The six algorithms for this study are: (i) three filter methods mRMR (min-Redundancy Max-Relevance), FCBF# (Fast Correlation Based Filter) and ORFS (Orthogonal Relevance Feature Selection), (ii) two wrapper methods FRBPSO (Fuzzy Rule Based Particle Swarm Optimisation) and SVM-RFE (Support Vector Machine-Recursive Feature Elimination), and (iii) one embedded method SBMLR (Sparse Multinomial Logistic Regression via Bayesian L1 regularisation). The three filter methods are adapted into suitable hybrid techniques and multiple associative measures are explored to determine the best performance per algorithm. All algorithms include the pre-processing techniques MDL discretisation and SIS to explore their improvements and shortcomings. The performance per algorithm is based on their ability to improve classification accuracy with the least amount of features possible and compared to one another. After comparison, the algorithms best suited for classification improvement, computation speed advantage and feature removal capability are revealed. Thereafter, a case study involving plant foliage features where the amount of features greatly outnumber the number of samples, denoted by p >> n, is used to compliment the findings. The use of pre-processing techniques proved to be and SVM-RFE (Support Vector Machine-Recursive Feature Elimination), and (iii) one embedded method SBMLR (Sparse Multinomial Logistic Regression via Bayesian L1 regularisation). The three filter methods are adapted into suitable hybrid techniques and multiple associative measures are explored to determine the best performance per algorithm. All algorithms include the pre-processing techniques MDL discretisation and SIS to explore their improvements and shortcomings. The performance per algorithm is based on their ability to improve classification accuracy with the least amount of features possible and compared to one another. After comparison, the algorithms best suited for classification improvement, computation speed advantage and feature removal capability are revealed. Thereafter, a case study involving plant foliage features where the amount of features greatly outnumber the number of samples, denoted by p >> n, is used to compliment the findings. The use of pre-processing techniques proved to be crucial regarding improved classification accuracy and reduced computation time. Out of all six algorithms, mRMR, and SVM-RFE proved the most promising.en_US
dc.language.isoenen_US
dc.publisherNorth-West University (South Africa)en_US
dc.subjectFeature selectionen_US
dc.subjectclassificationen_US
dc.subjectsupport vector machinesen_US
dc.subjectparticle swarm optimisationen_US
dc.subjectmutual informationen_US
dc.titleA critical comparison of feature selection algorithms for improved classification accuracyen_US
dc.typeThesisen_US
dc.description.thesistypeMastersen_US
dc.contributor.researchID10732926 - Van Vuuren, Pieter Andries (Supervisor)en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record