Implementation of Agglomerative Hierarchical Clustering Algorithm Applying the Map-Reduce Parallel Approach
The map-reduce model is a method for executing large data applications. It is also a parallel programming model for writing applications that can be executed on the cloud. Organizations are increasingly producing data that is generated by business processes, user activities, website tracking, sensors, finance, accounting, and more. Data clustering algorithms are used as tools for analyzing large volumes of data. The main purpose of these algorithms is to categorize data into clusters so that the data objects in each cluster are more similar. In this paper, a dense hierarchical clustering algorithm, one of the data mining techniques, is implemented using map-reduce design and then the results of this algorithm are compared with the usual one. Experiments show that runtime decreases with increasing input data size. The runtime of the algorithm improved by 16.80% for the 200 data-point dataset, and 29.26% for the dataset with 1000 data points. The percentage of CPU usage in the parallel system also increased from 22% to 94%.
- حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران میشود.
- پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانههای چاپی و دیجیتال را به کاربر نمیدهد.