Predicting Regional Spatial Distribution of Soil Texture Fractions in Sistan Flood Plain using Random Forest Method
Soil properties are highly spatially variable in flood plains. Soli texture is an important physical soil properties that have affect many agricultural and environmental activities, so it has strongly influenced water retention curve, fertility, drainage and porosity. So, knowledge on its spatial distribution is essential especially in alluvial plains and large scale. Field studies of Spatial Distribution of soil properties, especially on a large scale and in floodplains, a large number of soil samples may need to be collected, which is cost- and time-consuming. Digital soil mapping (DSM) method using remote sensing data are coupled as significant impact on predicting spatial distribution soil properties.
The aim of this study was to predict the spatial distribution of soil texture fractions in Sistan flood plain at a regional scale (area 1300 km2). In this study, 160 soil samples collected under different of various soil series of the surface layer (0–30 cm) in the agriculture land of Sistan plain and soil texture fractions including percentage of sand, silt and clay content were measured. So, remotely sensed data including Landsat 8’s Band 1, Band 2, Band 3, Band 4, Band 5, Band 6, Band 7, Band 8 and Band 4/ Band 8, Band 4/ Band 3, NDVI index, brightness index, clay index, grain size index were used as auxiliary variables for interpolation of soil texture fractions. Random forest technique was used to examine the relation between auxiliary variables and the soil texture components. Random forest is a developed model of classification and regression tree (CART). In the RF model, hundreds or thousands of classification trees are produced. 80 % of data was used for prediction and 20 % of data was used for validation, and RMSE, nRMSE, Willmott index (dr), Effectiveness index (EF), MBE and MAE were used for evaluation.
Pearson's correlation analysis showed that among soil texture fractions, sand content has the highest significant relationship with the most environmental variables. Band 8 had the highest correlation with sand, silt and clay content. The findings of the research show that the use of remote sensing data has increased the accuracy of predictions. The results show that the values of RMSE and MAE are lower for prediction set than validation set whereas the values of ME are similar for both sets. The values of RMSE of estimating percentage of sand, silt, clay at validation sites using RF method were 15.42, 12.56 and 8.97 %, respectively. Also, the values of RMSE of estimation by ordinary kriging were 18.2, 9.53 and 15.1% for sand, silt and clay, respectively that were 18, 5.9 and 11.2 % higher than those obtained by RF model. Also, the values of nRMSE were 0.19, 0.13 and 0.2 for prediction dataset and it was 0.39, 0.21 and 0.34 for validation dataset for sand, silt and clay fractions, respectively. The results of dr coefficient value shows that the modeling has been done with acceptable accuracy. Also value of EF shows that spatial maps of soil texture fraction produced by using RF model has good accurate. So, RF method when combined by remotely sensed data is a suitable method for mapping soil texture fractions in a regional scale. Also, between auxiliary variables, results showed that the clay index and grain size index were the most important environmental variables for predicting soil texture by the random forest method in the study area. The results of Wilmot's coefficient of agreement (dr) show that the modeling has been carried out with acceptable accuracy. Also, the evaluation of the efficiency factor (EF) values of the model shows that the random forest method has correctly produced the maps of soil texture components in the studied area. Other environmental variables such as Band 4 - Band 8 ratio, Band 1, Band 8 and Band 7 also influenced soil texture fractions prediction.
Remote sensing data combined with the random forest model can be applied for an appropriate prediction of spatial distribution pattern of soil texture fractions in large scale floodplains with a hot and dry climate condition. Highly of RMSE value for sand and silt than clay, which could be due to the wider range of silt and sand over the study region. Another reason for this could be related to the number of samples used. Therefore, it is recommended that for better accuracy in soil property maps, especially physical properties, the number of soil sampling points be increased, and optimal sampling points in these areas be determined. For future works, the use of other co-variables such as land use map, distance from the river, soil series, and salinity map or remote sensing data of smaller resolution, as well as hyperspectral visible and near-infrared reflectance spectroscopy should be evaluated for a regional spatial prediction of soil fractions in floodplains.