Creation of near infrared spectroscopy calibration algorithms for soil water content prediction
Abstract
Water is arguably the most important human resource, and fresh water is becoming increasingly scarce. Irrigation in agriculture is one of the most water-demanding sectors, which urges the importance of effective management decisions within this sector. Conducting accurate measurements of soil water content is paramount in making productive decisions in agricultural water management. The problem, however, is those traditional methods of water content measurement in soils, are expensive and immobile. Methods such as the neutron probe is required to calibrate for each soil type and carries a health hazard for users. Additionally, tensiometers and electrical resistance probes are stationary and cannot be relocated to measure different areas. Near infrared spectroscopy (NIRS) can provide a solution to the need for a rapid, accurate, and cost-effective method for measuring soil moisture content. A calibration model is however needed by NIRS in order to make sense of observations. There are unfortunately no such freely available calibration models for South African soils. This study aimed to create such calibration algorithms for soil water content and dry bulk density (DBD) predictions on approximately 213 soil core samples taken at predetermined locations in five catchments in South Africa. Algorithms were also created to compare to the only freely available calibrations algorithms from the Open Soil Spectroscopy Library (OSSL). The samples were scanned with a portable, handheld NeoSpectra NIR scanner at different moisture contents, which was achieved by saturating the samples and then placing them in pressure chambers to gradually reduce the moisture, at pressures of 33 (Drained upper limit), 100, 500, and 1500 kPa (lower limit). At each scan the soil was weighed to determine the gravimetric water content, which was later converted to volumetric water content using the bulk density. Calibration algorithms were then created on the R coding platform using the spectral data and were compared against the lab-determined volumetric water content. The calibration algorithms were trained on 75% of the dataset, selected by K-means clustering on the spectra. Calibration algorithms were created using Random Forest, Cubist, and Partial Least Squares machine learning algorithms and various pre-processing methods including Savitzky-Golay, Multiplicative Scatter Correction, Standard Normal Variate, and Normalization. Various statistical measures were used to evaluate the accuracy of the different machine learning and pre-processing combinations. Validation was performed on the remaining 25% samples of the dataset and the results were evaluated using a range of statistical methods, which include the mean error that measures the bias of the model, the root mean square error (RMSE) indicating overall model accuracy, the correlation coefficient (R2) showing the correlation between predicted and actual values, Lin’s concordance coefficient (rhoC) which is an adapted correlation coefficient used for model precision and accuracy, the ratio of prediction to deviation (RPD) that indicates predictive performance, and lastly the ratio of performance to interquartile distance (RPIQ) that measures predictive performance and model robustness. The best results were obtained by using a combination of Savitzky-Golay pre-processing paired with the Random Forest machine learning technique, with an RMSE of 6.9%, R2 of 0.62, bias of 0.75, rhoC of 0.75, and RPIQ of 1.8. Algorithms for DBD were created, where a combination of removed outliers, Savitzky-Golay with a Cubist model provided the best results with an RMSE of 0.16 g/cm3, an R2 of 0.72, a bias value of 0.01, a rhoC of 0.82, and an RPIQ of 2.31. Calibration algorithms for volumetric water content and DBD were also created for each catchment, where the results showed an overall further improvement over the regional algorithms that used all of the samples. The best catchment calibration algorithm for water content was the upper Olifants catchment’s water content, with a low mean error and RMSE of 0.59 and 5.07% respectively, accompanied by a high rhoC, RPD and RPIQ of 0.81, 1.85, and 1.94 respectively. The best catchment algorithm for DBD was the Sabie catchment with a 0 mean error and low RMSE of 0.08 g/cm3 a high rhoC, RPD, and RPIQ of 0.96, 3.52, and 3.75 respectively. Algorithms were then created for water retention at 33 kPa, 1500 kPa, and bulk density using the regional dataset to compare with the freely available international OSSL algorithms. The validation set of each created algorithm was uploaded to the OSSL prediction service to obtain predicted values, which were then compared to the created algorithms. Although the created algorithms were poorly predicted, they were still superior to the OSSL calibrations. The OSSL algorithm for the drained upper limit water retention indicate poor results of an RMSE of 17.12%, a mean error of 12.38, and very low RPD and RPIQ of 0.59 and 0.51 respectively. The created algorithm for the drained upper limit water retention performed better, with an RMSE of 8.89%, a mean error of 1.98, and an RPD and RPIQ of 1.13 and 0.99 respectively. The OSSL model for the lower limit water retention performed very poorly with an RMSE, mean error, RPD, and RPIQ of 20.85%, 19.24, 0.39, and 0.5 respectively. The created algorithm performed better with an RMSE, mean error, RPD, and RPIQ of 8.35%, 1.57, 0.97, and 1.24 respectively. The results that were found supported the idea that local calibrations are necessary for accurate and reliable predictions. The eventual compilation of sufficient local calibrations can lead to effective regional calibration algorithms, where sufficient regional algorithms can lead to the effective and accurate use of international algorithms like the OSSL algorithms.