data mining algorithms
در نشریات گروه علوم انسانی-
Data mining as a science is a search process for finding new, valuable, and unknown information among a wealth of data. This approach is among the top 10 developing knowledge that will face the next decade with a technological revolution and has been expanding rapidly in recent years. The purpose of this research is to explore learners' behavior in interactive educational environments using data mining algorithms. Exploration and extraction of meaningful relationships among different factors with the help of data mining algorithms can be used to provide solutions to improve learners' performance. The research instrument was a questionnaire. The validity of the questionnaire was obtained through content analysis and its reliability was measured through re-testing. The data collection discussed in this study relates to the activities and interactions of 200 active students in a university social network. After gathering information about learners' interactions, the Apriori algorithm of data mining is used to analyze the behavior of learners through the use of rules. The main goal is to facilitate the education along with obtaining the proficiency of the courses. Improved interactive content is obtained from feedback information based on the data mining techniques of the data management system. Therefore, the results of this research can help educational system managers to plan their education and optimize educational processes in the interactive environment.
Keywords: Behavioral Sciences, Interactive Learners, Data Mining Algorithms, Association Rules -
حرکات توده ای با توجه به ماهیت، تنوع و خطراتی که دارند، همواره مورد توجه پژوهشگران علوم مختلف بوده اند. مطالعات گسترده ای در زمینه ی شناخت عوامل موثر، پهنه بندی و مدل سازی این فرآیند صورت گرفته است، اما در زمینه ی کاربرد الگوریتم های داده کاوی مطالعات محدودی انجام شده است. لذا در این پژوهش با هدف استفاده از علم داده کاوی، زمین لغزش های جنوب شرق شهرستان نیشابور بررسی و نقشه ی پهنه بندی خطر با روش های آماری دو متغیره شامل ارزش اطلاعات و تراکم سطح تهیه شد. 15 لایه اطلاعاتی شامل ارتفاع از سطح دریا، شیب اراضی، جهت شیب، اقلیم، کاربری اراضی، خاک شناسی، پوشش گیاهی، زمین شناسی، میزان تبخیر، دما، بارش، تیپ اراضی، فاصله از جاده، فاصله از گسل و فاصله از آبراهه در محیط ArcGIS رقومی و با استفاده از الگوریتم های داده کاوی در نرم ا فزار R، بهترین الگوریتم و عوامل موثر شناسایی و معرفی شدند. برطبق نتایج این تحقیق، متغیرهای زمین شناسی، آب وهوا، جهت شیب، فاصله از جاده، ارتفاع، خاک شناسی و تیپ اراضی به عنوان مهم ترین عوامل وقوع زمین لغزش در نظر گرفته شد. همچنین نتایج حاکی از برتری الگوریتم جنگل تصادفی با دقت 92% بود. نتایج ارزیابی نقشه ی پهنه بندی نشان داد به ترتیب 45/45% و 51/51% از حرکات توده ای مرحله ارزیابی، در پهنه با خطر زیاد و خیلی زیاد قرار گرفته است و مابقی در پهنه های با خطر کمتر واقع شده اند. بنابراین نتایج بیانگر دقت مناسب مدل سازی است، اما در مقایسه ی دو روش آماری، روش تراکم سطح نسبت به روش ارزش اطلاعات برای منطقه ی مورد مطالعه مناسب تر معرفی شد.
کلید واژگان: خطرات طبیعی، زمین لغزش، ارزش اطلاعات، تراکم سطح، داده کاویIntroductionMass movement, according to their nature, variety, hazards for human lives, and properties, have always been a matter of interest to various scholars. Considering that the occurrence of this phenomenon has a complex mechanism and complex factors and variables can affect it, extensive studies to identify the effective factors, classification, zoning, and modeling of this process have been conducted. In this study, landslides of three watersheds in the southeast of Neishabour city were investigated and the hazard zonation map was prepared, using bivariate statistical methods of the information value and area density. There are few studies regarding the application of different data mining methods to determine the effective variables in the occurrence of landslides and most studies are based on other statistical methods. Data mining is called as knowledge discovery in databases and is a way to discover new and beneficial information through a lot of data. Some of the most important data mining algorithms include the decision tree, random forest, boosting aggregate demand, support vector machine, logistic regression, and neural network algorithm. The data mining extracts useful information from large volumes of data and has shown a good performance. Therefore, the aim of the present study was to prioritize
MethodologyThe present study aimed to investigate the factors affecting the occurrence of a landslide and its zoning in three watersheds including Kharv, Harimabad and Grineh watersheds in the Razavi Khorasn province. First, 99 landslides were identified in the area and the landslide distribution map was prepared. Then, all effective factors on watershed landslides, in 15 information layers including the altitude, slope, aspect, climate, land use, pedology, vegetation cover, geology, evaporation, temperature, rainfall, land type, distance from road, distance from fault, and distance from river were digitized in the ArcGIS environment. Then, using data-mining algorithms in R software, the preferable algorithm and effective factors on landslide occurrence, were introduced. Finally, the landslide hazard zonation in the GIS software was done using bivariate statistical models.
ResultsThe results showed that the random forest algorithm with an accuracy of 92% is the best one and the variables of geology, climate, aspect, distance from road, altitude, pedology and land type are the most important variables in algorithms modeling. The most probability of occurrence of watershed landslides placed in areas with west and northwest directions, slopes higher than 30 degrees, dominant type of the environmental factors affecting the occurrence of a landslide including the altitude, slope, aspect, climate, land use, pedology, vegetation cover, geology, evaporation, temperature, rainfall, land type, distance from road, distance from fault, and distance from river using data mining algorithms, zoning its sensitivity, and bivariate statistical models of information value and area density in three watersheds including Kharv, Harimabad, Grineh watersheds in Razavi Khorasan province. mountains, the semi-humid climate, 1500 to 2000 mm evaporation class, entisols, dense vegetation, the gardens, bushes and shrubs land uses, being close to the roads and faults and being far from the rivers, and the altitudes of 2000 to 2500 m with the phyllite, boulders and sandstone formations. The results of the zoning map evaluation using the information value and density area methods showed that 45.45% and 55.55 % of landslides were respectively located at the high and very high risk zones and the rest were in very low, low, and moderate risk zones. As a result, in both methods, most of landslides were in the high and very high risk zones that indicated the suitable accuracy of the model.
Discussion and ConclusionsAccording to the results of this research, variables including the geology, climate, aspect, distance from road, altitude, soil science, and land type were considered as the most important factors in the occurrence of a landslide. In addition, factors such as slope, land use, vegetation cover, distance from fault and distance from river were identified as the most important factors influencing the development of landslide and classified as natural factors, which could be influenced by human factors. The comparison of two mentioned methods showed that the area density method was more appropriate than the information value method for the study area.
Keywords: Natural Hazards, Landslide, Data Mining Algorithms, Bivariate Statistical Methods, Hazard zoning
- نتایج بر اساس تاریخ انتشار مرتب شدهاند.
- کلیدواژه مورد نظر شما تنها در فیلد کلیدواژگان مقالات جستجو شدهاست. به منظور حذف نتایج غیر مرتبط، جستجو تنها در مقالات مجلاتی انجام شده که با مجله ماخذ هم موضوع هستند.
- در صورتی که میخواهید جستجو را در همه موضوعات و با شرایط دیگر تکرار کنید به صفحه جستجوی پیشرفته مجلات مراجعه کنید.