Design and Preparation of Persian Labeled Dataset from COVID-19 News for Fake News Detection

Message:
Article Type:
Research/Original Article (دارای رتبه معتبر)
Abstract:
Fake news detection using content features have attracted many researchers in the last few years. These approaches rely mainly on news datasets and analyzing their style and content. Although there are some fake news datasets in English, fake news detection in the Persian language suffers from the lack of suitable datasets. This article introduces a manually labeled Persian fake news dataset, containing about 5000 posts related to COVID-19 and extracted from Telegram messenger. The process of building the dataset is done in two stages: 1) data collection and pre-processing; and 2) labeling manually using a settled rule set and an established framework. In the labeling stage, seven tasks have been used for labeling, including: 1) Factual; 2) Hate, blame, and negative speech; 3) Rising moral, encouragement, and advise; 4) Political news; 5) Death statistics; 6) Cure, medicine, and health care; and 7) Worth to be considered for fact checking. For each labeling task, 3 labels including “Yes”, “No”, and “Can’t decide” are used. The main labeling task, i.e. “Factual” task is assigned to two annotators and in case of disagreement between annotators, the label assigned by third annotator is accepted. The kappa measure for inter-annotators agreement obtained equal to 0.706 that is in substantial range. This dataset is about 10 times larger in comparison to similar Persian datasets and can be used for not only fake news studies but also some other Persian Natural Language Processing (NLP) studies.
Language:
Persian
Published:
Language and Linguistics, Volume:19 Issue: 37, 2024
Pages:
173 to 192
magiran.com/p2708093  
دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:
اشتراک شخصی
با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!
اشتراک سازمانی
به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!
توجه!
  • حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
  • پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.
In order to view content subscription is required

Personal subscription
Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.
Organization subscription
Please contact us to subscribe your university or library for unlimited access!