A Density Based Clustering Approach to Distinguish Between Web Robot and Human Requests to a Web Server

Message:
Abstract:
Today world''s dependence on the Internet and the emerging of Web 2.0 applications significantly increased the requirement of web robots crawling the sites to support services and technologies. Regardless of the advantages of robots, they may occupy the bandwidth and reduce the performance of web servers. Despite a variety of researches, there is no accurate method for classifying huge data sets of web visitors in a reasonable amount of time. Moreover, this technique should be insensitive to the ordering of instances and produce deterministic accurate results. Therefore, this paper presents a density-based clustering approach using Density-Based Spatial Clustering of Applications with Noises (DBSCAN), to classify web visitors of two real large data sets. We propose two new features based on the behavioral patterns of visitors to describe them. What''s more, we consider 12 common features and use the significance of the difference test (T-test) to reduce the dimensions and overcome one of the disadvantages of DBSCAN. Based on the supervised evaluation metrics, the proposed algorithm has the 95% of Jaccard metric and produces two clusters having the entropy and purity rates of 0.024 and 0.97, respectively. Furthermore, from the standpoint of clustering quality and accuracy, the proposed method performs better than state-of the-art algorithms. Finally, it can be concluded that some known web robots through imitating human users make it difficult to be identified.
Language:
English
Published:
International Journal of Information Security, Volume:6 Issue: 1, Jan 2014
Page:
6
magiran.com/p1339887  
دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:
اشتراک شخصی
با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!
اشتراک سازمانی
به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!
توجه!
  • حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
  • پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.
In order to view content subscription is required

Personal subscription
Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.
Organization subscription
Please contact us to subscribe your university or library for unlimited access!