Using an Automatic Weighted Keywords Dictionary for Intelligent Web Content Filtering

Message:
Abstract:
Filtering of web pages with inappropriate contents is one of the major issues in the field of intelligent network''s security. Having a good intelligent filtering method with high accuracy and speed is needed for any country in order to control users'' access to the web. So, it has been considered by many researchers. Presenting web pages in an understandable way by machines is one of the most important preprocessing steps. Thus, offering a way to describe web pages with lower dimensions would be very effective, especially in determining the nature of web pages with respect to whether they should be filtered out or not. In this paper, we propose an automatic method to detect forbidden keywords from web pages. Next, we define a new representation of web pages in vector form which consists of weighted sum and frequency of forbidden keywords in different parts of web pages named RWSF. For this, a ranking dictionary of keywords including forbidden keywords is used. To evaluate the proposed method, 2643 pages consisting of 1311 normal pages and 1332 forbidden pages were used. Among these, 1851 pages were used to train the system and 792 pages were used for system evaluation. The system has been assessed using various classifiers such as: k-Nearest Neighbor, Support Vector Machines, Decision Tree and Artificial Neural Networks. Evaluation results indicate the high efficiency and accuracy of the proposed method in all classifiers.
Language:
English
Published:
Journal of Advances in Computer Research, Volume:6 Issue: 1, Winter 2015
Pages:
101 to 114
magiran.com/p1374079  
دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:
اشتراک شخصی
با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!
اشتراک سازمانی
به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!
توجه!
  • حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
  • پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.
In order to view content subscription is required

Personal subscription
Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.
Organization subscription
Please contact us to subscribe your university or library for unlimited access!