Comparing the performance of the multiple linear regression classic method and modern data mining methods in annual rainfall modeling (Case study: Ahvaz city)

Message:
Article Type:
Case Study (دارای رتبه معتبر)
Abstract:
Introduction

Prediction of hydrological variables, especially precipitation, is very important in the management and planning of water resources. For this reason, accurate estimation methods have always been of interest to researchers. Furthermore, due to the water crisis in different regions, it is necessary to use different methods to predict the rainfall and the resulting runoff so that comprehensive and appropriate management can be applied in the field of water distribution. Since the past, various methods have been developed and used by researchers to predict hydrological variables. The use of classical methods such as multiple linear regression to predict hydrological variables, especially precipitation, has been one of the most important and widely used methods that have had good results. Recently, data mining methods have been developed for this purpose. In this research, a comparison between the performance of the classic multiple linear regression and modern data mining methods was made in the annual rainfall modeling of Ahvaz city, and finally the best model in terms of performance was determined.

Materials and Methods

In this study, the annual rainfall of Ahvaz city has been investigated and modeled. Meteorological data from Ahvaz station was collected over a period of 30 years (1992-2021). The data validation tests including tests of homogeneity, normality, trend, and outlier data were performed. Annual rainfall modeling of Ahvaz city was done with Multiple Linear Regression (MLR), Principal Component Analysis (PCA), Gene Expression Programming (GEP), and Support Vector Machine (SVM). Finally, using the coefficient of determination (R2), Root Mean Square of Errors (RMSE), Nash-Sutcliffe Efficiency (NSE), and Willmott index (WI), the accuracy and performance of the models were compared.

Results and Discussion

In this study, XLSTAT software was used to model rainfall with multiple linear regression. In order to simulate precipitation through the SVM model, it is possible to examine the types of kernel function, among which linear and polynomial kernels of the second and third degree, which are common types used in hydrology, are selected and through trial and error the optimal results of this The type of kernels was calculated. According to these results, the support vector machine model with third degree polynomial kernel was determined as the optimal method of precipitation modeling. In simulating the precipitation process using gene expression programming, because this model has the ability to select more effective variables and eliminate variables with less influence, therefore, in this project, all eight input factors are used to determine meaningful variables and for further investigation, in addition to the set The default mathematical operators of the program (F1), modes based on the values of the four main operators (F2) and the set of operators F3 and F4 have been used.The results of the validation tests that check the homogeneity, trend, normality, and outlier data showed the good quality of the recorded data and the possibility of using them with a high percentage of confidence to continue the study. The results of comparing the models showed that the methods of PCA and GEP with R2=0.85, NSE=0.85, and WI=0.96 and very little difference in RMSE equal 35.49 and 35.70, respectively. They have predicted the annual rainfall of Ahvaz with better performance and more accuracy compared to other models. Considering the water crisis in different regions of the country, especially in Ahvaz, it is suggested to use the methods introduced in this research to predict rainfall and runoff resulting from it, so that a comprehensive and appropriate management can be applied in the field of water distribution.

Conclusion

In this research, a comparison was made between classical statistical methods and some modern data mining methods in forecasting the annual rainfall of Ahvaz city. The hydrological data of Ahvaz synoptic meteorological station was collected in a period of 30 years (1371-1400) and first the data was verified using homogeneity, trend, normality and outlier data tests. The results showed the good quality of the recorded data and the possibility of using them with a high percentage of confidence. Multiple linear regression (MLR), principal component analysis (PCA), gene expression programming (GEP) and support vector machine (SVM) methods were used to model precipitation. The results of running the models were compared using the coefficient of explanation (R2), root mean square errors (RMSE), Nash-Sutcliffe efficiency (NSE) and Wilmot index (WI). The results showed that the methods of principal component analysis and gene expression programming with R2 criteria equal to 0.85, NSE equal to 0.85 and WI equal to 0.96 and a very small difference in RMSE values equal to 35.49 and 35.70, respectively, compared to Other models have better performance and more accuracy.According to the results of this research, it is suggested to use modern data mining methods in addition to classical statistical methods in future researches. Also, it is necessary to pay attention to the use of functions and optimal factors of models to achieve the best results in future researches. Considering the water crisis in different parts of the country, especially in Ahvaz, it is suggested to use the methods introduced in this research to predict the rainfall and runoff caused by it, so that a comprehensive and appropriate management can be applied in the field of water distribution.

Language:
Persian
Published:
Journal of Water and Soil Management and Modeling, Volume:3 Issue: 2, 2023
Pages:
125 to 142
magiran.com/p2562590  
دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:
اشتراک شخصی
با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!
اشتراک سازمانی
به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!
توجه!
  • حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
  • پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.
In order to view content subscription is required

Personal subscription
Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.
Organization subscription
Please contact us to subscribe your university or library for unlimited access!