DynamicCluStream: An algorithm Based on CluStream to Improve Clustering Quality

Message:
Article Type:
Research/Original Article (دارای رتبه معتبر)
Abstract:
Data streams are continuous flows of data objects generated at high rates, requiring real-time processing in a single pass. Clustering algorithms play a vital role in analyzing data streams by grouping similar data samples. Among various time windows for evolving streams, the sliding window method gradually moves over the data, focusing on the most recent information and improving clustering accuracy while reducing memory requirements. The development of distributed computing frameworks like Apache Spark has addressed the limitations of traditional tools in processing big data, including data streams. This paper presents the DynamicCluStream algorithm, an enhancement over Spark-CluStream, which employs a two-phase clustering approach with precise clustering of recent data. The algorithm dynamically determines the number of clusters by merging overlapping clusters during the offline phase, resulting in significant improvements in cluster precision. Experimental results show that it performs up to 47 percent better on average in terms of precision on the CoverType dataset and up to 92 percent better on average in terms of precision on the PowerSupply dataset.  Although the algorithm is slower due to data sample removal and cluster integration, its impact is negligible in a distributed environment.
Language:
English
Published:
International Journal of Web Research, Volume:6 Issue: 2, Autumn-Winter 2023
Pages:
77 to 87
https://www.magiran.com/p2730482  
سامانه نویسندگان
  • Yousef Sanati، Morteza
    Corresponding Author (2)
    Yousef Sanati, Morteza
    Assistant Professor Computer Engineering, Faculty of Engineering, Bu-Ali Sina University, Hamedan, Iran
  • Mansoorizadeh، Muharram
    Author (3)
    Mansoorizadeh, Muharram
    Associate Professor Computer engineering, Bu-Ali Sina University, Hamedan, Iran
اطلاعات نویسنده(گان) توسط ایشان ثبت و تکمیل شده‌است. برای مشاهده مشخصات و فهرست همه مطالب، صفحه رزومه را ببینید.
مقالات دیگری از این نویسنده (گان)