Comparison of Machine Learning Methods in the Selection of Predictors of Atmospheric-Ocean General Circulation Models

Message:
Article Type:
Case Study (دارای رتبه معتبر)
Abstract:
Introduction

Nowadays, climate change is one of the human challenges in the exploitation and management of water resources. Temperature along with precipitation is one of the most important climatic elements and is one of the main factors in zoning and climatic classification. Due to location of Iran within the drought belt and proximity to the high-pressure tropical zone, this country has an arid and semi-arid climate and suffers from drought in majority of years. Therefore, temperature fluctuations and variability are important issues, and make the study of temperature changes a necessity. In the current study, four data mining algorithms in selecting predictors for downscaling of maximum temperature in Birjand synoptic station have been studied, compared and the superior algorithm has been introduced. As the number of large scale features are high, selection of machine learning algorithm will play as an important role in statistical downscaling of climatic variables such as maximum temperature. 

Materials and Methods

Today, the data set is such that many variables are used to describe the climatic phenomenon in environmental studies. As the number of data is huge, choosing the predictors is one of the most important steps in preprocessing machine learning. In this study, four machine learning methods including stochastic approximation of simultaneous turbulence (SPSA), Least Absolute Shrinkage and Selection Operator (LASSO), Ridge and Gradient Boosting Method (GBM) in selecting important features in downscaling of maximum temperature in Birjand synoptic station during the statistical period of 1961-2019 were studied and compared. It is a mechanism to find a combination of predictors that with a minimum number of predictors can produce an acceptable evaluation index in estimating the variable under study. For the present study, the weather information of Birjand Synoptic Meteorological Station has been prepared by the Meteorological Organization of Iran. In order to calibrate and validate the machine learning algorithms, 70% and 30% of the available monthly data, respectively, were allocated for this purpose. To conduct this research, coding in R-Studio environment and Caret and Fscaret packages were used. In this study, to evaluate the performance of the algorithms, three indices includes relative Nash-Sutcliffe Efficiency (rNSE), Volume Efficiency (VE) and Kling-Gupta Efficiency (KGE) were used.

Results and Discussion

Before using the algorithms in selecting large-scale predictors, the correlation between these variables and the maximum observational temperature at Birjand station was investigated. Large scale variables mslp, P1_v, P8_v, P8_u, P850 Temp, with a maximum correlation temperature of 0.6 showed that the correlation is acceptable given the complexity of the climate change phenomenon. In addition, these results show that all the algorithms used the important factors including F1, F2, F15, F16, F18, F20 and F26 by more than 50% and the first variable (mean pressure at the ocean surface) was the most important parameter in downscaling of maximum temperature. Also, the highest importance was for P1_v and the lowest value related to P5_u, as 73.2% and 15%, respectively. Violin plots of downscaled maximum temperature in validation step of different algorithms along with the observed maximum temperature in Birjand synoptic station in each of the algorithms showed that the values of the first and third quartiles in the output data of SPSA algorithm compared to other algorithms were closer to the observed data. According to the evaluation criteria, SPSA algorithm has a higher performance than other algorithms in reproducing the maximum monthly temperature values in Birjand synoptic station. Also, based on the volumetric efficiency evaluation criteria and relative Nash-Sutcliffe, GBM algorithm was more successful in selecting predictors than Ridge and LASSO algorithms. It is also observed that SPSA algorithm shows different results than other algorithms. In comparison of mean and variance of downscaled and observed maximum temperature, the results of t-test and F-test showed that SPSA algorithm has higher efficiency than other algorithms in regenerating mean and variance of observed maximum temperature in Birjand synoptic station at the 5% significance level.

Conclusion

The data used in this study included large scale atmospheric variables and the maximum observed temperature at Birjand station. The algorithms were used to select important predictors and the performance of these methods in the validation part. According to the results of this study, the highest importance among large-scale variables is related to P1_v and the lowest value is related to P5_u, the values of which were 73.2% and 15%, respectively. The SPSA algorithm also performs better than other algorithms in selecting predictors and consequently the maximum temperature.

Language:
Persian
Published:
Journal of water and soil, Volume:37 Issue: 1, 2023
Pages:
129 to 143
magiran.com/p2573194  
دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:
اشتراک شخصی
با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!
اشتراک سازمانی
به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!
توجه!
  • حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
  • پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.
In order to view content subscription is required

Personal subscription
Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.
Organization subscription
Please contact us to subscribe your university or library for unlimited access!