The Efficiency of Different Feature Selection Methods in Digital Mapping of Subgroup and Soil Family Classes with Data Mining Algorithms

Message:
Article Type:
Research/Original Article (دارای رتبه معتبر)
Abstract:
Background and Objectives

High-accuracy of soil maps is a powerful tool for achieving land sustainability in agricultural and natural resources. This study was conducted to determine the effect of different feature selection methods with machine-learning algorithms to prepare digital mapping of soil classes at two taxonomic levels from subgroup down to family in the interest region, i.e. Vargar lands of Abdanan city, related to the Ilam province.

Materials and Methods

Study area is 1027 hectares with 628.6 mm and 22.6 ºC mean annual precipitation and temperature, respectively. Three major physiographic units including Hilland, Piedmont plain and Alluvial plain were considered. Soil moisture and temperature regimes are calculated based on the Newhall model in JNSM 6.1 version software. A total of 44 soil profile observation with random sampling pattern was determined based on standardized soil surveys then digging, description and after sampling from all genetic horizons then soil samples were transferred to the laboratory. Finally, all of the soil profiles were classified based on the soil taxonomy system (2014) down to the family level. Geomorphometric covariates as a representative of soil-forming factors were prepared from the digital elevation model (ALOS PALSAR Satellite,2011) with 12.5 m resolution in SAGA GIS 7.4 version software. Three feature selection approaches included Boruta, Variance inflation factors (VIF) and Mean decrease accuracy (MDA) with two Random forest (RF) and Fuzzy logic data mining algorithms were applied for relating soil-landscape relationship by using “random-forest”, “caret” packages in R 3.5.1 and SoLIM solution version 2015 software’s. Sample-based project used for predicting soil classes in Fuzzy logic modeling process. In total observation profile split into two data set included 80 percent (n=36) for calibrating and 20 percent for validating (n=8) based on bootstraps sampling algorithm random forest. Internal validation of the random forest algorithm was done based on out of bag error percentage (OOB%). The best model performance was determined based on overall accuracy (OA) and kappa index, also for each individual class user accuracy (UA) and producer accuracy (PA) were applied.

Results

The results have shown that from a number of 40 geomorphometry covariates, six covariates included Terrain classification index for lowlands, Annual insolation, Topographic position Index, Upslope curvature, Real surface area, and Terrain surface convexity were selected by MDA as the best environmental covariates. Also, the RF-MDA method with overall accuracy of 84% and Kappa index of 0.56 had the best performance compared to other methods (RF_VIF, RF-BO, Fuzzy-MDA) in the subgroup level with 58, 55, 50 and 0.3, 0.67 and 0.18 respectively. Out of bag error results (%OOB) for RF-MDA, RF-VIF and RF-Boruta were obtained that 72.42%, 67.86%, and 82.76% for subgroup level and 93.10%, 93.10% and 86.21% for the family level respectively. while there was little difference between the accuracy of the method at the family taxonomic level and performed similar results in the modeling of soil classes process. The results of the fuzzy approach showed that the kappa index values ​​and overall accuracy of this method were similar to the other three scenarios and there was a slight difference between the accuracy of the results at the soil family level. In the fuzzy method, it was observed that the kappa and overall accuracy values ​​at the subgroup level were lower than the other scenarios. Fuzzy approaches in contrasted to RF modeling prevented continuous spatial variability by generating fuzzy maps for each of the soil classes in the landscape. These results indicate that the random forest method is superior to the fuzzy method in family class mapping and soil subgroups. Based on the MDA sensitivity analysis index, similarly, three geomorphometry covariates included Terrain surface convexity (convexity), Terrain classification index for lowlands (TCI_Low) and Real surface area (Surface_Ar) had the highest importance for predicting soil classes at two taxonomic levels. With regarded to final soil predicted maps area, two classes (Fine-silty, carbonatic, hyperthermic Typic Haplustepts) and Typic Calciustolls with 32.70% and 48.90% and (Fine-silty, carbonatic, hyperthermic Typic Calciustolls) and Typic Haplustepts with 0.18% and 1.85% had the highest and lowest content at family and subgroup maps respectively.

Conclusion

In general, using different variable selection approaches in situations where soil classes have a relatively imbalanced abundance can increase the accuracy of digital mapping in soil studies. Increasing the number of field observations and the use of other environmental variables affecting soil formation can also be used for graduating in prediction low-accuracy soil classes.

Language:
Persian
Published:
Journal of water and soil, Volume:34 Issue: 4, 2020
Pages:
973 to 987
magiran.com/p2199379  
دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:
اشتراک شخصی
با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!
اشتراک سازمانی
به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!
توجه!
  • حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
  • پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.
In order to view content subscription is required

Personal subscription
Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.
Organization subscription
Please contact us to subscribe your university or library for unlimited access!