Using Ensemble Model Approach for Spatial Modeling of Soil Imbalanced Classes

Author(s):

Mastaneh Rahimi Mashkaleh , Mohammad Amir Delavar * , Mohammad Jamshidi

Message:

Article Type:

Research/Original Article (دارای رتبه معتبر)

Abstract:

Introduction

Imbalanced data remains a widespread and significant challenge, particularly impacting machine learning algorithms. Therefore, addressing imbalanced data classification has emerged as a crucial research area within the field of data mining. This issue, often characterized by a limited number of instances in one class and a substantial number in other classes, poses substantial hurdles for machine learning algorithms. Consequently, data mining experts and machine learning professionals are actively working on refining methods and models for classifying imbalanced data with the aim of improving the accuracy of such classifications. The principal objective of this study is to precisely detect and categorize samples from the minority class, ultimately enhancing the precision of soil class classification. This research is conducted in a specific region, encompassing the southwestern territories of Zanjan province.

Materials and Methods

To achieve this objective, a total of 148 soil profiles were excavated using a regular grid pattern with an average spacing of 500 meters (and in some locations, up to 700 meters based on expert recommendations). After the samples were air-dried, they were transported to the laboratory. Physical and chemical analyses were conducted on all collected samples, including assessments of soil texture, soil pH, calcium carbonate equivalent, cation exchange capacity, electrical conductivity, organic carbon content, and gypsum content. Subsequently, the soil samples were meticulously classified and described up to the family level, following the comprehensive standards of the soil classification system. The most appropriate covariates were selected among 57 covariates including geomorphological and geological maps, digital elevation model (DEM), and data from Landsat 8 satellite images, using principal component analysis (PCA) and expert knowledge approaches for predicting soil classes selected. Saga-GIS and ENVI software were used to extract environmental covariates. Modeling of the soil-landscape relationship was performed using three algorithms, namely multinomial logistic regression (MNLR), random forest (RF), boosted regression tree (BRT) and ensemble model (after data balancing) in “R studio” software. To check the accuracy of the used model, the data was randomly divided into training and validation data. 80% of the data (118 profiles) were used for model training and 20% (30 profiles) were used as validation data for evaluation.

Results and Discussion

The results of the selection of covariates showed that 10 information covariates of geomorphological maps, geological information and features extracted from the digital elevation model (DEM), including Analytical hill shading (AHS), sunrise, valley depth (VD), LS Factor, Channel network distance (CND), Topographic wetness index (TWI) and Multi-resolution ridge top flatness (MRRTF) were selected as input variables. Based on the results of profile analysis, the soils of the region at the subgroup level were categorized into five classes, with imbalanced distribution, including Typic Calcixerepts, Typic Haploxerepts, Gypsic Haploxerepts, Typic Xerorthents, and Lithic Xerorthents. The results of evaluation metrics such as overall accuracy and Kappa index were 65% and 0.32 for the RF algorithm, %60 and 0.35 for the boosted regression tree algorithm, 65% and 0.41 for the MNLR algorithm and after balancing the data with the ensemble model approach, it was 70% and 0.62 respectively. The results of two statistics of user’s accuracy and producer’s accuracy showed that among individual models, the multinomial logistic regression model has higher accuracy in predicting soil classes. Although the ensemble model has succeeded in predicting the soil minority classes well, due to the fact that the two weaker models of the RF and BRT are involved in the modeling, It showed lower values compared to the individual multinomial logistic regression model, in predicting some classes of the majority of soil, especially the two classes of Typic Haploxerepts and Typic Xerorthents.

Conclusions

In summary, the results have demonstrated that when learning algorithms are individually applied, they do not exhibit high accuracy in spatially predicting soil classes. However, when these algorithms are amalgamated into an ensemble model, they exhibit remarkable accuracy in spatial soil class prediction, outperforming individual models in terms of performance and accuracy. Moreover, the ensemble model substantially enhances prediction accuracy and reduces the occurrence of misclassifications, especially at the subgroup level. While each specific model excels in predicting a particular soil classification, the cumulative ensemble models consistently outperform individual models in terms of overall performance and accuracy, underscoring the effectiveness of ensemble modeling in improving spatial soil classification.

Keywords:

Boosted Regression Trees , Data balancing , Imbalanced dataset , Minority

Language:

Persian

Published:

Journal of Agricultural Engineering, Volume:46 Issue: 3, 2024

Pages:

289 to 308

https://www.magiran.com/p2677316

دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:

اشتراک شخصی

با ثبت ایمیلتان و پرداخت حق اشتراک سالانه به مبلغ 1,950,000 ريال، بلافاصله متن این مقاله را دریافت کنید.اعتبار دانلود 70 مقاله نیز در حساب کاربری شما لحاظ خواهد شد.

پرداخت حق اشتراک به معنای پذیرش "شرایط خدمات" پایگاه مگیران از سوی شماست.

پست الکترونیکی

اگر مقاله ای از شما در مگیران نمایه شده، برای استفاده از اعتبار اهدایی سامانه نویسندگان با ایمیل منتشرشده ثبت نام کنید. ثبت نام

اشتراک سازمانی

به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!

اطلاعات بیشتر ثبت نام با ایمیل دانشگاهی/سازمانی

توجه!

حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.

In order to view content subscription is required

Personal subscription

Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.

Organization subscription

Please contact us to subscribe your university or library for unlimited access!

More information

سامانه نویسندگان

Mohammad Amir Delavar

Corresponding Author (2)

Associate Professor Soil science, Agriculture, University of Zanjan, Zanjan, Iran

اطلاعات نویسنده(گان) توسط ایشان ثبت و تکمیل شده‌است. برای مشاهده مشخصات و فهرست همه مطالب، صفحه رزومه را ببینید.

مقالات دیگری از این نویسنده (گان)

The effect of biochar and methyl Jasmonate on biochemical alterations, yield and yield components of barley (Hordeum vulgare L.) under drought stress
Sajjad Nasiri *, Babak Andalibi, Afshin Tavakoli, Mohammad Amir Delavar, Lukas Van Zwieten
Journal of environmental stresses in crop sciences,
Modeling the spatial distribution of sand, silt, and clay particles based on GlobalSoilMap and Limited Data
L. Lotfollahi, M.A Delavar *, A. Biswas, M. Jamshidi, R. Taghizadeh-Mehrjardi, Sh. Fatehi
Desert, Summer -Autumn 2023

علمی مصوب

نشریه مهندسی زراعی

Journal of Agricultural Engineering

فصلنامه کشاورزی و منابع طبیعی

آخرین شماره | آرشیو

ISSN: 2588-5944 eISSN: 2588-526X

برای مشاهد آرشیو به نشریه «علمی کشاورزی» مراجعه نمایید.

صاحب امتیاز:

دانشگاه شهید چمران

مدیر مسئول:

دکتر عطاالله خادم الرسول

سردبیر:

دکتر مصطفی چرم

تلفن نشریه: ۰۶۱-۳۳۳۶۴۰۵۲ (داخلی 3090)

اطلاعات بیشتر نشریه

درباره نشریه پیام به نشریه سایت اختصاصی نشریه پذیرش الکترونیکی مقاله