Author gender identification from text using Bayesian Random Forest

Message:
Article Type:
Research/Original Article (دارای رتبه معتبر)
Abstract:
Nowadays high usage of users from virtual environments and their connection via social networks like Facebook, Instagram, and Twitter shows the necessity of finding out shared subjects in this environment more than before. There are several applications that benefit from reliable methods for inferring age and gender of users in social media. Such applications exist across a wide area of fields, from personalized advertising to law enforcement of reputation management. Text posts represent a large portion of user generated content, and contain information which can be relevant to discovering undisclosed user attributes, or investigating the honesty of self-reported age and gender. Because the highest rate of information exchanges is in text format, author identification from the aspects like age, gender, political and religious opinions from these contents will seem more considerable. Gender identification  that could be useful in security and marketing, also answers the following question: given a short text document, can we identify if the author is a male or a female?  This question is motivated by recent events where people faked their gender on the Internet. In this paper, author gender identification in blog’s data is investigated. In this regard, four groups of features include syntactic features, word-based features, character-based features, and function words are employed. In addition, character n-gram features is used for improving the accuracy of classification. For evaluation of the proposed method, 3212 texts were collected from Technorati.com and blogger.com. Experimental results demonstrate that these types of features are practical. furthermore, a new classification method called "Bayesian Random Forest" is introduced. Each tree in Bayesian Random Forest  is a Bayes tree. The results of experiment show that this method attains noticeable results in comparison with other classification algorithms such as Naïve Bayes, Naïve Bayes Tree, and Random Forest and it increases accuracy of gender identification to 89.5%.
Language:
Persian
Published:
Signal and Data Processing, Volume:16 Issue: 1, 2019
Pages:
143 to 157
magiran.com/p2003601  
دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:
اشتراک شخصی
با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!
اشتراک سازمانی
به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!
توجه!
  • حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
  • پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.
دسترسی سراسری کاربران دانشگاه پیام نور!
اعضای هیئت علمی و دانشجویان دانشگاه پیام نور در سراسر کشور، در صورت ثبت نام با ایمیل دانشگاهی، تا پایان فروردین ماه 1403 به مقالات سایت دسترسی خواهند داشت!
In order to view content subscription is required

Personal subscription
Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.
Organization subscription
Please contact us to subscribe your university or library for unlimited access!