DynamicCluStream: An algorithm Based on CluStream to Improve Clustering Quality
Author(s):
Article Type:
Research/Original Article (دارای رتبه معتبر)
Abstract:
Data streams are continuous flows of data objects generated at high rates, requiring real-time processing in a single pass. Clustering algorithms play a vital role in analyzing data streams by grouping similar data samples. Among various time windows for evolving streams, the sliding window method gradually moves over the data, focusing on the most recent information and improving clustering accuracy while reducing memory requirements. The development of distributed computing frameworks like Apache Spark has addressed the limitations of traditional tools in processing big data, including data streams. This paper presents the DynamicCluStream algorithm, an enhancement over Spark-CluStream, which employs a two-phase clustering approach with precise clustering of recent data. The algorithm dynamically determines the number of clusters by merging overlapping clusters during the offline phase, resulting in significant improvements in cluster precision. Experimental results show that it performs up to 47 percent better on average in terms of precision on the CoverType dataset and up to 92 percent better on average in terms of precision on the PowerSupply dataset. Although the algorithm is slower due to data sample removal and cluster integration, its impact is negligible in a distributed environment.
Keywords:
Language:
English
Published:
International Journal of Web Research, Volume:6 Issue: 2, Autumn-Winter 2023
Pages:
77 to 87
https://www.magiran.com/p2730482
سامانه نویسندگان
مقالات دیگری از این نویسنده (گان)
-
A Movie Recommender System Based on Topic Modeling using Machine Learning Methods
Mojtaba Kordabadi, Amin Nazari, *
International Journal of Web Research, Autumn-Winter 2022 -
DynamicEvoStream: An EvoStream based Algorithm for Dynamically Determining The Number of Clusters in Data Streams
Z. Amighi, M. Yousef Sanati *, M. Dezfoulian
Journal of Electrical Engineering, Autumn 2021 -
Satisfiability Checking of Clinical Practice Guidelines Using an Analyzer
*, Amirabbas Asadi
Modern Care Journal, Jan 2020