Hongyi Li

and 4 more

The aim of this paper was to compare the prediction performance of three strategies: general global Partial least squares regression (PLSR) using CSSL with and without spiking samples, memory-based learning (MBL) using CSSL with and without spiking samples and general PLSR using only spiking samples to predict soil organic matter in the target area. When using spiked subsets, we also investigated the prediction performance of the extra-weighted subsets. A series of spiking subsets randomly selected from the total spiking samples were selected by conditioned Latin hypercube sampling (cLHS) from the target sites. We calculated the mean squared Euclidean distance (msd) of different spiking subsets with the distribution density function of their vis–NIR spectra only and statistically inferred the optimal sampling set size to be 30. Our study showed that when the number of spiking were lower than 30, the predicted accuracy derived from global PLSR using CSSL spiked with and without extra-weighted samples was greater than the predicted accuracy derived from the general PLSR using the corresponding number of spiking samples only (RMSE 5.57–5.98 v.s. RMSE 6.76). Global PLSR using CSSL spiked with the statistically optimal local samples can achieve higher predicted performance (with a mean RMSE of 5.75). MBL spiked with five extra-weighted optimal spiking samples achieved the best accuracy with an RMSE of 3.98, an R2 of 0.70, a bias of 0.04 and an LCCC of 0.81. The msd is a simple and effective method to determine an adequate spiking size using only vis–NIR data.

Meihua Yang

and 2 more

In situ visible near infrared diffuse reflectance spectroscopy (VNIR) is a rapid and in-situ sensing approach and can provide analytical dense soil data reflecting multiple physical and chemical properties of soil. A total of 246 in situ soil samples were collected and scanned in 2016-2018. The dataset from 2016-2017 was used as the calibration dataset to develop the dry ground model and to develop to the in situ correction matrix using the dry and in situ spectra. The dataset from 2018 was used as the validation dataset using the in situ spectra. Four in situ correction methods, external parameter orthogonalization (EPO), direct standardization (DS), piecewise direct standardization (PDS), and generalized least squares weighting (GLSW) were used to remove the in situ effect on the spectra. In addition, two models, partial least squares regression (PLSR) and support vector machine (SVM), were used to detect the effectiveness of the prediction. The results showed that the four in situ corrections could remove the error introduced by in situ measurement to some extent. The four in situ corrections, when combined with SVM, could better reduce the errors caused by in situ measurements than the same corrections combined with PLSR. EPO correction outperformed the other three methods, and EPO-SVM obtained the best prediction with the lowest RMSE (1.91 g kg-1) and highest Lin’s concordance correlation coefficient (LCCC) (0.84). We conclude that the EPO-SVM methods using in situ spectra can detect soil organic carbon in the Poyang Lake area in a rapid and minimally invasive manner.