Digital soil mapping by machine learning techniques

Message:
Article Type:
Research/Original Article (دارای رتبه معتبر)
Abstract:
Background and Objectives

The use of geospatial techniques for mapping soils is broadly covered by the term digital soil mapping (DSM). Soil maps have considerable significance as basic maps in many environmental and natural resources studies. Digital soil maps are based on the relationship between environmental variables and soil properties. With the development of computers and technology, digital and quantitative approaches have been developed. Continuous utilization of agricultural lands regardless of the land suitability caused soil destruction. Also, incompetency in custom methods, invention geographic information system (GIS), and remote sensing (RS) techniques cause erupt and use of digital soil mapping.

Methodology

The study area is approximately 5000 ha which is located in the west of Heris region of East Azerbaijan province, Iran. In the first study, the potential of different models to predict soil classes at different taxonomic levels was investigated. According to semi-detailed soil, survey and using stratified random sampling method, 50 pedons and 50 augers with an approximate distance of 1000 m were excavated, described and soil samples were taken from different genetic horizons. Based on the pedon descriptions and soil analytical data, pedons were classified up to the family level. Different machine learning techniques, namely boosted regression tree (BRT), random forest (RF), artificial neural networks (ANNs), and multinomial logistic regression (MLR) were used to test the predictive power for mapping the soil classes. After preparing the soil properties maps and checking their accuracy, these maps were used along with auxiliary parameters for estimating soil classes using an artificial neural network model in the R software. Finally, the accuracy and uncertainty of the model were evaluated by overall accuracy and confusion index, respectively.

Results

Results showed that the different models had the same ability for prediction of the soil classes across all taxonomic levels but a considerable decreasing trend was observed for their accuracy at subgroup and family levels. The terrain attributes were the most important auxiliary information to predict the soil classes up to the family level. The main goal of the second study was to predict soil surface properties (pH, electrical conductivity, gypsum, organic carbon, calcium carbonate equivalent, coarse fragments, and particle size distribution) using ANNs, BRT, generalized linear model (GLM), and multiple linear regression (MLR). Among the studied models, GLM showed the highest performance to predict most soil properties whereas the best model is not necessarily able to make an accurate estimation. Also, the terrain attributes were the most important environmental covariates to predict the soil classes in all taxonomic levels, but they could not display the soil variation entirely. This shows that the unexplained variations are controlled by unobserved variations in the environment, which can be due to the management over time. Results suggested that the DSM approaches have not enough prediction accuracy for the soil classes at lower taxonomic levels that focus on the soil properties affecting land use and management. Results showed that the entry of more details in the soil classification at the lower levels of the Soil Taxonomy system while increasing the number of classes, leads to decreasing the overall accuracy and increasing uncertainty. It is noticeable that the ANNs model has a good accuracy up to the great group level through the acceptable level of overall accuracy (i.e., 75 %), hence it has a high degree of uncertainty. Therefore, the accuracy of the model could not be effective in its selection through the modeling process; however, paying attention to its uncertainty is also very important along with the model error.
 

Conclusion

Terrain attributes were the main predictors among different studied auxiliary information. The accuracy of the estimations with more observations is recommended to give a better understanding about the performance of DSM approach over low-relief areas. Further studies may still be required to distinguish new environmental covariates and introduce new tools to capture the complex nature of soils. Accordingly, we suggest using the other methods of soft computing for modeling in plain areas or low relief regions. Finally, the use of DSM methods is increasing over time and will eventually be considered as distinct and novel techniques.

Language:
Persian
Published:
Journal of Soil and Plant Science, Volume:34 Issue: 4, 2024
Pages:
1 to 14
https://www.magiran.com/p2815623