Source classification in deep radio surveys using machine learning techniques
Hosenie, Zafiirah Banon
MetadataShow full item record
Until now radio galaxies have primarily been classified using the human neural system. The Square Kilometre Array (SKA) will, however, produce a very large amount of science data, extending into the multiple-petabyte range. Therefore there is an urgent need to develop new, automated techniques to maximally exploit the SKA data. Machine Learning (ML) techniques are currently being used in several fields of Astrophysics and in this thesis we comprehensively explore ML as a way to distinguish point and extended sources (P-E) and to classify radio galaxies as belonging to Fanaroff-Riley class I or II (FRI-FRII). Our first step was to classify radio sources based on their morphology using filtering methods. We used images from the Sydney University Molonglo Sky Survey (SUMSS) and compared the following techniques: (i) the LULU operators and the Discrete Pulse Transform (DPT) algorithms with a low and high pass filtering. The LULU and DPT algorithms have only been successful in classifying extended sources and are computationally expensive. (ii) we then explored other techniques to extract the sources by applying a high pass filter to the radio images. Using Otsu thresholding and Gaussian filtering methods, we have been able to extract not only extended sources but also made gains in computational time. Our next approach has been to classify P-E and FRI-FRII sources using various ML algorithms. These included the Multi Layer Perceptron (MLP), Random Forest (RF), k-Nearest Neighbours (kNN) and Naive Bayes (NB) which require specific features of the radio images as inputs. We implemented shapelet analysis to decompose the radio images into their corresponding shapelet coefficients which are then fed into the ML algorithms. For P-E discrimination, a neural network was the most effective algorithm, with an accuracy of 89% and area under curve (AUC) value of 93%. For FRI-FRII sources, the RF algorithm proved to be the best with an accuracy of 75% and AUC value of 74%. The final stage of this thesis has been to apply deep learning to FRI-FRII source classification in the form of a Convolutional Neural Network (CNN). For the first time in radio astronomy we have added a Generative Adversarial Neural (GAN) network to generate realistic looking data to supplement the real data during training. The result from the CNN+GAN algorithm has proved to be better than both the RF algorithm and the CNN alone with standard data augmentation (flipping and rotation), yielding an accuracy of 84% and AUC value of 85%, showing that combining GANs with convolutional networks for radio astronomy is likely to add significant value in the era of the SKA.