Presenting a Model of Data Anonymization in Big Data in the Context of In-Memory Processing Framework

Author(s):

E. Shamsinejad , T. Banirostam * , M. M. Pedram , A. Rahmani

Message:

Article Type:

Research/Original Article (دارای رتبه معتبر)

Abstract:

Background and Objectives

Nowadays, with the rapid growth of social networks extracting valuable information from voluminous sources of social networks, alongside privacy protection and preventing the disclosure of unique data, is among the most challenging objects. In this paper, a model for maintaining privacy in big data is presented.

Methods

The proposed model is implemented with Spark in-memory tool in big data in four steps. The first step is to enter the raw data from HDFS to RDDs. The second step is to determine m clusters and cluster heads. The third step is to parallelly put the produced tuples in separate RDDs. the fourth step is to release the anonymized clusters. The suggested model is based on a K-means clustering algorithm and is located in the Spark framework. also, the proposed model uses the capacities of RDD and Mlib components. Determining the optimized cluster heads in each tuple's content, considering data type, and using the formula of the suggested solution, leads to the release of data in the optimized cluster with the lowest rate of data loss and identity disclosure.

Results

Using Spark framework Factors and Optimized Clusters in the K-means Algorithm in the proposed model, the algorithm implementation time in different megabyte intervals relies on multiple expiration time and purposeful elimination of clusters, data loss rates based on two-level clustering. According to the results of the simulations, while the volume of data increases, the rate of data loss decreases compared to FADS and FAST clustering algorithms, which is due to the increase of records in the proposed model. with the formula presented in the proposed model, how to determine the multiple selected attributes is reduced. According to the presented results and 2-anonomity, the value of the cost factor at k=9 will be at its lowest value of 0.20.

Conclusion

The proposed model provides the right balance for high-speed process execution, minimizing data loss and minimal data disclosure. Also, the mentioned model presents a parallel algorithm for increasing the efficiency in anonymizing data streams and, simultaneously, decreasing the information loss rate.

Keywords:

big data , Anonymity , Confidentiality , Data Disclosure , Privacy

Language:

English

Published:

Journal of Electrical and Computer Engineering Innovations, Volume:12 Issue: 1, Winter-Spring 2024

Pages:

79 to 98

magiran.com/p2658595

دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:

اشتراک شخصی

با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!

اشتراک سازمانی

به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!

اطلاعات بیشتر

توجه!

حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.

In order to view content subscription is required

Personal subscription

Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.

Organization subscription

Please contact us to subscribe your university or library for unlimited access!

More information

علمی مصوب

Journal of Electrical and Computer Engineering Innovations

مجله ی نوآوری های مهندسی برق و کامپیوتر

دوفصلنامه فنی مهندسی به زبان انگلیسی

Electrical and Computer Engineering Innovations

آخرین شماره | آرشیو

ISSN: 2322-3952

صاحب امتیاز:

دانشگاه تربیت دبیر شهید رجایی

مدیر مسئول:

دکتر سعید علیایی

سردبیر:

رضا ابراهیم پور

تلفن نشریه: ۰۲۱-۲۲۹۷۰۰۶۰ (داخلی 2234)

اطلاعات بیشتر نشریه

درباره نشریه پیام به نشریه سایت اختصاصی نشریه پذیرش الکترونیکی مقاله

به جمع مشترکان مگیران بپیوندید!

Presenting a Model of Data Anonymization in Big Data in the Context of In-Memory Processing Framework

E. Shamsinejad , T. Banirostam * , M. M. Pedram , A. Rahmani

big data , Anonymity , Confidentiality , Data Disclosure , Privacy

Journal of Electrical and Computer Engineering Innovations

مجله ی نوآوری های مهندسی برق و کامپیوتر