Determining Features Influencing Some Soil Physical Quality Indicators and their Predictions Using Decision Tree and Multiple Linear Regression Models

Article Type:
Research/Original Article (دارای رتبه معتبر)
Abstract:
Introduction
Soil quality is defined as the capacity of a soil to function within different land uses and ecosystem boundaries, sustain biological productivity, maintain environmental quality and promote plant, animal, and human health. Soil quality cannot be directly measured but can be evaluated on the basis of several parameters; the type of parameter to be used depends on research scale and goals. Soil quality indicators (SQIs) are used to evaluate the effect of different management and types of land use on soil quality and can be achieved by easily-measured soil physicochemical properties. Soil quality indicators are measurable characteristics of the soil affecting the soil capacity for crop production or environmental performance. Air capacity (AC), relative field capacity (RFC) and plant available water (PAWC) are the most important indicators. Selection of appropriate input parameters is the first and most important step in predicting SQIs. Feature selection can be defined as the identification and selection of a subset of useful features among the primary data collected. One of the methods for choosing the features is the Pearson coefficient, which shows the correlation between the input variables and target variable. When the coefficient is close to one, there is a strong relationship between the input and the target variable. The features having a correlation coefficients of greater than or equal to 0.9 are considered important and less than that are considered non-important. Decision tree algorithm is one of the prediction approaches in statistics and data mining literature. This algorithm can select the property with the highest separation capability. Working with this algorithm and interpret its results is very straightforward. The aims of this study were to select the best set of input properties influencing SQIs using Pearson correlation coefficient and then model the effect of the input properties by decision tree and multiple linear regression.
Materials And Methods
In this study, the Pearson correlation coefficient was used for selecting effective soil properties influencing SQIs and these indices were modeled and predicted by the decision tree algorithm with selected input properties. For this purpose, 104 soil samples were collected from the soil surface (0-15 cm depth) of four land uses including a garden with 20 year-old walnut trees, pasture, agriculture and a mountain almond in a semi-arid area in Iran (Rabor region, 29 27′ N to 38 54′ N and 56 45′ E to 57 16′ E). A multiple linear regression (MLR) model was constructed as the benchmark for the comparison of performances. Sensitivity analysis of decision tree model was performed with input variables using StatSoft method. The predictive capabilities of the proposed models were evaluated by the mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination (R2) between measured and predicted SQIs values.
Results And Discussion
The soil properties including porosity, bulk density, clay and sand content for air capacity, porosity and sand, clay and silt content for relative field capacity, and bulk density, electrical conductivity, porosity, and sand, clay and silt content for plant available water were selected as important input parameters. In addition, the values of r2 for the decision tree model for air capacity, relative field capacity and plant available water were 0.95, 0.84 and 0.85, respectively, while the r2 values for multiple linear regression for AC, RFC and PAWC were 0.63, 0.62 and 0.61, respectively. According to the evaluation indices, it appears that the conventional regression model was poor in predicting SQIs. Therefore, conventional regression techniques (i.e., multiple-linear regression) may not be reliable for predicting the SQIs. The results of sensitivity analysis for decision tree model showed that porosity and bulk density for air capacity, porosity for relative field capacity and bulk density for plant available water had the greatest influence.
Conclusion
This research work provided a basis for predicting soil physical quality indicators and identifying important parameters impacting these indicators in agricultural soils, grassland and forests in semi-arid regions which can be generalized to other areas. Further studies are needed to assess the effects of selected input variables under different conditions.
Language:
Persian
Published:
Journal of water and soil, Volume:32 Issue: 2, 2018
Pages:
327 to 342
magiran.com/p1843588  
دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:
اشتراک شخصی
با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!
اشتراک سازمانی
به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!
توجه!
  • حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
  • پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.
In order to view content subscription is required

Personal subscription
Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.
Organization subscription
Please contact us to subscribe your university or library for unlimited access!