Choosing the Best Hierarchical Clustering Technique Based on Principal Components Analysis for Suspended Sediment Load Estimation

Message:
Abstract:
Introduction
The assessment of watershed sediment load is necessary for controling soil erosion and reducing the potential of sediment production. Different estimates of sediment amounts along with the lack of long-term measurements limits the accessibility to reliable data series of erosion rate and sediment yield. Therefore, the observed data of suspended sediment load could be used to estimate soil loss in the catchment upstream. Hence, one of the valid methods to estimate soil erosion is using of the recorded data of hydrometery stations in combination with catchment characteristics that will provide accurate predictions. For this purpose, recognition of similar sub-watersheds according to climatic, physiographic, geologic land use could be useful in the erosion control operations.
THEORETICAL FRAMEWORK: To estimate the exact amount of sediment in the ungauged areas, clustering is introduced as a key step. Various methods and techniques have been used to determine the best number of clusters. However, application of different clustering methods and selection of the best one is rarely found. To this aim, the objective of present study is to determine the most important variables in sediment production using Single linkage, Ward and β-Flexible methods for the clustering of sub-watersheds of Gorganroud and Qareh-Sou river basins in Golestan Province.
Methodology
The Gorganroud and Qareh Sou Watersheds are located at the North-Eastern part of Iran. The seventeen hydrometric stations were selected with a 24-year (1986–2010) recorded data of discharge and suspended sediment load. The Grubbs and Beck method was used to perform the verity in order to verify the outlier discharge measured data. The correlation method was used to fill the missing data in time series. The normality of discharge and suspended sediment data were tested using Kolmogrov-Smirnov test and verified for choosing the well-set trend analyses method. The linear regression and Mann-Kendal Taw methods were used for the data with normal and non-normal distribution in trend analysis, respectively. Auto Correlation Function (ACF) test method was used to determine the internal consistency between the data series.
A set of 38 factors from the five main groups of categories were investigated to determine the sediment yield controlling independent variables. Principal Component Analysis (PCA) was used to determine the most effective variables. In order to detect the best classification method, three classification techniques (Single linkage, Ward’s, and β-flexible methods) were examined in the study area. The Single Linkage also called nearest neighbor is a simple clustering method. The object pairs forms clusters hierarchically starting from the most similar pairs according to the similarity in a descending order. Ward’s algorithm is one of the frequently used techniques for the regionalization studies of hydrology and climatology factors. A generalized hierarchical method, β-Flexible, formed the group calculating the external object. The distance from a point to the group was computed in this method.
Many indices have been developed to examine the validity of clustering techniques based on finding an optimal partitioning. In the present study, Pseudo F and Dunn’s Indices were used to assess the accuracy of clustering algorithms. Accurate clustering means having non-overlapping partitions. One of the most commonly used criteria for the selection of group number is the maximization of pseudo-F statistics. This statistics is based on multivariate normal distribution of data.
Results
All data series of 17 sub-watersheds in Gorganroud and Qareh Sou basins were tested with different clustering alghorithms. Two data series showed autocorrelation, detected by the ACF test. Two data sets had trends according to the Kendal’s test. Therefore, 13 sub-watersheds remained for the final classification. Some 38 independent variables were calculated and screened with PCA. The variables with similar effects on sediment yield, were grouped in 7 components. The selected components were chosen according to the amount of variance. The results of PCA and the selected representative variables in each component have been given in Table 1.
CONCLUSIONS & SUGGESTIONS: The results showed that the Single linkage method presented a better performance considering the accuracy criterion. The suspended sediment values were determined using measured discharge and available Sediment Rating Curves; therefore, the identified clusters as the reliable and appropriate watershed grouping methods which could be regarded as a useful tool in the management of watersheds particularly in the context of erosion and sedimentation.
Language:
Persian
Published:
Environmental Erosion Researches, Volume:6 Issue: 4, 2017
Pages:
47 to 67
magiran.com/p1697148  
دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:
اشتراک شخصی
با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!
اشتراک سازمانی
به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!
توجه!
  • حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
  • پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.
In order to view content subscription is required

Personal subscription
Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.
Organization subscription
Please contact us to subscribe your university or library for unlimited access!