Integrated noise reduction-data mining method for soil organic matter prediction by VNIR spectrometry

Message:
Article Type:
Research/Original Article (دارای رتبه معتبر)
Abstract:

Background and Objective:

 Soil as a heterogeneous natural resource and the largest organic carbon storage in terrestrial ecosystems is composed of complicated processes and mechanisms. The necessity of accurately estimating soil properties on the national and regional scales for improving soil management, and understanding their influence on agriculture have resulted in attracting researchers’ attentions to this field. Soil Organic Matter (SOM) is considered as an indicator of soil quality in fertility and food production. It is also considered as a key variable in environmental and agricultural issues. Thus, using rapid and cost effective and more accuracy estimation of the SOM content in soil resources assessment and management can be helpful. In precision agriculture, the scale of soil data required for management of lands and products is very large. The scale of collecting filed data usually cannot fulfil those needs. Sampling, preparing and analyzing the large number of soil samples as well as producing the distribution map for large areas are very difficult. In addition, traditional laboratory methods of soil analysis are boring, time-consuming, and costly. In fact, they need specialized laboratory operators. The aim of the present study is to compare the performance of the two Partial Least Squares Regression (PLSR) and Boosted Regression Tree (BRT) for predicting SOM using VNIR spectrometry data. With the use of combining Wavelet transform and diagnosis of independent bands, noises existing in soil spectroscopic data has reduced. In addition, independent and effective spectra and bands in spectroscopy of SOM were selected. Consequently, in the present research, Wavelet-PCA-PLSR and Wavelet-PCA- BRT models were developed and performance were assessed.

Materials and Methods:

 42 surface (0-30cm) soil samples in the heterogeneous areas of urban-agricultural regions in Tehran province were collected. Soil Organic Carbon (OC) measured using Walki Black method and the samples’ spectrums were measured by ASD FieldSpec-3 spectrometer. First and second derivitation of spectral reflectance and absorbance were calculated. To reduce noises and smooth the spectrum, Sym8 matrix function of wavelet transform was used, wavelet transform is conducted to show and reconstruct characteristics in the spectrum. Principal component analysis and Hotelling's T2 test with 95% confidence level were used for outlier detection. PLSR and BRT was conducted onreflectance, absorbance and their first and second derivatives, at five levels of wavelet transform. Then, by comparing the results, the appropriate model was selected via validation. For doing the PLSR in nonlinear data, Kernel functions were used. When using numerical samples, regression trees are used instead of decision trees. But their processes are the same. In regression trees, the greedy algorithm was used. Therefore, by answering the binary question through which node the maximum data about respons variable is obtained, the root node and its two children are obtained. Producing the structure of trees is recursively repeated and a typical stopping criterion is considered. The stopping criterion can be as achievement to a split which cannot be divided and provides fewer data, or when data in the node contain 5% of the total data. Moreover, the tree size should be minimized. For splitting the node, the Ginny factor, entropy factor, etc. were used for minimizing those factors. In addition, the total square error is calculated in each branches and those with minimized values are selected. In addition, in the regression tree, the pruning process is employed for over-fitting. The BRT consists of the two regression tree and boosting techniques for improving the predictability of each of them. For calibration and validation of the model, 30 and 12 soil samples were randomly selected, respectively and R2 and RMSE were used for quantify the accuracy of models. Moreover, to select the best production factor of the PLSR mode, explained variance residual values and RMSE of validation were considered. Finally, soil organic matter map was produced using Landsat OLI satellite imagery and the proofed method for the study area.

Results and Discussion :

The SOM value acceptably, the creation of continuous mappings with more accuracy based on noise reduction and retention of suitable data have always received researchers’ attentions. The present study tried to find the better method such a more accurate quantization of SOM using soil spectroscopic data. Using wavelet transform and outlier removal based on Hotelling's T2 via the PCA, the suitable data were extracted for producing the more accurate quantization. In this method, independent and effective bands or spectra remain in the model, while Lin et al. used wavelet transform and correlation techniques for selecting appropriate bands in estimating SOM. Since the soil reflectance is more complex and affected by several factors, using correlation method in these heterogeneous areas such as the area studied in the present study does not lead to acceptable results. Considering the data values, the unsupervised PCA method calculates principle components and eigenvalues and eigenvectors. It also tries to maximize the covariance matrix based on Singular Value Decomposition (SVD). SOM estimation models were developed using the PLSR and BRT for reflectance and absurbance spectra and their first and second derivation. Based on the results, the BRT method with RMSE and R2 values as 0.58 and 0.94, respectively leads in the better results for the data of the second derivation of reflectance. Moreover, values of RMSE and R2 in the PLSR were obtained as 1.0338 and 0.938, respectively for the data related to the second derivation of reflectance. However, comparing RMSE of the BRT and PLSR shows better results of the BRT model.

Conclusion:

 In that field measurements of chemical properties of soil such as organic matters are critically time-consuming and costly. Furthermore, measuring those properties is not possible in the large samples. So, the results of the present study indicate that in heterogeneous agricultural-urban areas, potential of the developed models such as wavelet-PCA-PLSR and wavelet-PCA-BRT can be used for estimating SOM. Meanwhile, these two algorithms do not make distributional assumptions and therefore, there are no strong assumptions about normality. Using continuous functions and satellite imagery, the map of the level of SOM in large scales can be prepared in order that it can be utilized in studies such as cultivation potential, soil fertility, and sustainable development of soil.

Language:
Persian
Published:
Journal of Rs and Gis for natural Resources, Volume:13 Issue: 3, 2022
Pages:
1 to 5
magiran.com/p2497024  
دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:
اشتراک شخصی
با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!
اشتراک سازمانی
به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!
توجه!
  • حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
  • پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.
In order to view content subscription is required

Personal subscription
Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.
Organization subscription
Please contact us to subscribe your university or library for unlimited access!