Evaluating the Performance of Machine Learning Models for Predicting Suspended Sediment Load (case study: Taleghan watershed, Iran)
Accurate prediction of suspended sediment load (SSL) in rivers is crucial for sustainable water resources management. Sediment, as a significant factor in degrading aquatic ecosystems, reducing the lifespan of hydraulic structures, and deteriorating water quality, has been a major concern. Given the complexity of hydrological processes and the interplay of various factors affecting sediment production and transport, accurate modeling of this phenomenon has always been challenging. In recent years, advancements in machine learning techniques have enabled the development of highly accurate predictive models. This study aims to enhance the understanding of sediment dynamics in watersheds and provide more precise tools for SSL prediction by applying machine learning models to daily SSL forecasting in the Taleghan watershed. The Taleghan watershed, characterized by its mountainous topography and specific hydrological conditions, was selected for this study due to its significance in supplying water to the Tehran region. Accurate SSL prediction in this watershed is of paramount importance.
For this study, hydrological data including discharge, precipitation, snow index, and suspended sediment load were collected from the Gelink Taleghan hydrometric station over the period 2000 to 2018. The collected data, after undergoing quality checks and outlier removal, were prepared for training and testing machine learning models. Data preprocessing involved standardization, normalization, and outlier removal to enhance model accuracy. Principal Component Analysis (PCA) was employed to reduce data dimensionality and select the most significant input variables for the model. This method aids in identifying new linear combinations of original variables that explain the maximum variance in the data. By using PCA, the number of input variables in the model can be reduced, thereby decreasing model training time and complexity. Six machine learning models, including xgbTree, Cubist, qrnn, Ctree, Cforest, and LASSO, were utilized in this study. These models were chosen due to their ability to model nonlinear and complex relationships between variables. To evaluate model performance, various statistical metrics such as Root Mean Squared Error (RMSE), Nash-Sutcliffe Efficiency (NSE), and Mean Absolute Error (MAE) were employed. Additionally, time series plots, scatter plots, and Taylor diagrams were used for visual assessment of model performance.
Results indicated that xgbTree, Cubist, and qrnn models outperformed other models in predicting SSL. These models effectively simulated daily variations in SSL. Overall, the xgbTree model demonstrated the best performance in SSL prediction. A comparison of the models' predictive accuracy revealed that ML algorithms can successfully predict daily SSL, particularly xgbTree (RMSE = 63.301, NSE = 0.97), Cubist (RMSE = 46.330, NSE = 0.96), and qrnn (RMSE = 85.349, NSE = 0.96), which exhibited the lowest prediction errors and highest efficiency metrics. Taylor diagrams further confirmed that xgbTree, Cubist, and qrnn models demonstrated a better fit to the observed data.Sensitivity analysis revealed that discharge and precipitation had the most significant influence on SSL variations. Additionally, employing PCA to reduce data dimensionality and select key variables played a crucial role in enhancing model performance. The findings of this study demonstrate that machine learning models can serve as powerful tools for predicting SSL in watersheds. Furthermore, the results indicated that machine learning models are capable of identifying complex and nonlinear patterns in hydrological data that cannot be detected using traditional modeling techniques. This is particularly significant in scenarios where climate change and human activities impact hydrological processes.
This study employed machine learning models to forecast daily suspended sediment load (SSL) in the Taleghan watershed. Results indicated that xgbTree, Cubist, and qrnn models exhibited superior performance in predicting SSL. These models can serve as powerful management tools for assessing the impacts of climate change and human activities on sedimentation processes, and for planning and managing water resources sustainably. The findings of this study demonstrate that machine learning models can be effective alternatives to traditional modeling methods for SSL prediction. These models can capture complex nonlinear relationships between input and output variables, leading to more accurate forecasts. However, to enhance predictive accuracy, further research is needed using higher spatial and temporal resolution data, as well as considering additional factors influencing sedimentation processes. For future research, it is recommended to explore hybrid models combining machine learning and physical models. Additionally, deep learning techniques can be employed to extract more complex features from data. Furthermore, investigating the impacts of climate change on sedimentation processes and developing predictive models under various climate scenarios is a crucial topic for future research.
-
Preventing Runoff Losses in Border Watersheds: Effectiveness of Rainwater Harvesting Systems (Case Study: Esfandak Saravan Watershed)
*, Masooleh Forozanfard
Iranian Journal of Rainwater Catchment Systems, -
Comprehensive assessment of drought severity with multi-indicator approach in Saravan city-Sistan and Baluchistan province
Mohammadianvar Bozorgzadeh, Hossein Jahantigh *, Mohammadreza Rigi, Mojtaba Mohammadi
Journal of Climate Change Research,