Magiran | جستجوی کلیدواژه "random forest algorithm"

Predict Business performance using machine learning Random Forest algorithm

Ehsan Farbin*

Journal of Business Data Science Research, Volume:4 Issue: 1, Winter 2025, PP 1 -5

The application of machine learning algorithms in predictive analytics has become a pivotal element in contemporary business decision-making processes. This study explores the efficacy of the Random Forest algorithm a renowned ensemble learning method or predicting business performance metrics such as sales, revenue, and customer engagement. The Random Forest algorithm is esteemed for its capacity to handle large datasets with numerous input variables and its intrinsic mechanism for feature selection, thereby enhancing prediction accuracy while mitigating overfitting concerns. We collected a comprehensive dataset encompassing various business performance indicators and their potential determinants, such as market trends, customer demographics, operational metrics, and competitive landscape data. Following data preprocessing to ensure data quality and relevance, we executed feature selection techniques to isolate the most impactful predictors. We then partitioned the dataset into training and testing subsets for model development and evaluation, respectively. The Random Forest model was trained on the training set with a diverse array of hyperparameters to identify the optimal configuration. Model validation was conducted using k-fold cross-validation to ensure generalizability across various data subsets. Post-training evaluation on the testing set employed standard performance metrics Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and R-squared value to assess the model’s predictive accuracy.

Keywords: Predictive Analytics, Random Forest Algorithm, Business Intelligence, Machine Learning, Data Preprocessing

ارزیابی حساسیت زمین لغزش و تعیین عوامل موثر در وقوع آن با استفاده از الگوریتم جنگل تصادفی (مطالعه موردی: حوضه آبخیز گلندرود)

علی گیلانی پور، صدرالدین متولی*، خه بات درفشی

نشریه جغرافیا و مخاطرات محیطی، پیاپی 53 (بهار 1404)، صص 247 -274

پژوهشگران بسیاری سعی نموده اند که مدل هایی برای ارزیابی حساسیت خطر زمین لغزش ارائه داده و به عبارت دیگر، به نقشه پهنه بندی لغزش ها برسند که بیش تر بر اساس روش استقرایی و مدل سازی های کمی و آماری بوده است. به این صورت که عوامل مختلف موثر در وقوع زمین لغزش را بررسی نموده و سپس چگونگی تاثیر آن ها را در پراکندگی لغزش ها تحلیل کرده اند. حوضه آبخیز گلندرود با توجه به ویژگی های زمین شناسی، تکتونیکی، شرایط اقلیمی، هیدرولوژیکی، توپوگرافی و پوشش گیاهی فقیر، دارای پتانسیل لغزشی بوده و دخالت غیراصولی انسان در آن باعث وقوع و تشدید حرکات توده ای می شود. در پژوهش حاضر با رویکردی توصیفی- تحلیلی و پیمایشی، به منظور تهیه نقشه حساسیت به ناپایداری دامنه ای و لغزش های حوضه مطالعاتی از 11 فاکتور موثر در ناپایداری دامنه ای و الگوریتم جنگل تصادفی استفاده شده است. این فاکتور ها عبارت است از: شیب، جهت دامنه، ارتفاع، فاصله از جاده، فاصله از گسل، فاصله از آبراهه، مجموع بارندگی سالانه، میانگین دمای سالانه، کاربری زمین، زمین شناسی و انحناء دامنه ها. تعداد 352 نقطه لغزشی با استفاده از تصاویر ماهواره ای و بازدیدهای میدانی مشخص شدند که از این تعداد، 70 درصد برای آموزش مدل و 30 درصد باقیمانده آن برای اعتبارسنجی مورد استفاده قرار گرفت. در ادامه، از کدنویسی الگوریتم جنگل تصادفی در محیط MATLAB R2020a برای شناسایی پهنه های مستعد به حرکات لغزشی استفاده شد. با توجه به نقشه خطرپذیری زمین لغزش در حوضه آبخیز گلندرود، بیش از 30 درصد منطقه در کلاس خطر بسیار زیاد، 19 درصد در کلاس خطر زیاد، 13 درصد در کلاس خطر متوسط، 19 درصد در کلاس خطر کم و 16 درصد از حوضه مطالعاتی نیز در کلاس خطر زمین لغزش خیلی کم قرار دارد. اولویت بندی متغیرهای موثر بیان گر آن است که بیش ترین وزن با رتبه معیار 98/0 مربوط به ارتفاع می باشد. تحلیل مفهوم کاتنا که بیان گر ارتباط میان الگو و چشم انداز خاک بر روی شیب دامنه با توپوگرافی است و منجر به تغییرپذیری خصوصیات خاک و به دنبال آن تغییر در پوشش گیاهی می شود، می تواند توجیه ارتباط یا اثرگذاری عامل ارتفاع بر حرکات لغزشی منطقه مطالعاتی باشد. مطالعه دقیق موضوع دلایل وقوع حرکت توده ای در منطقه گلندرود و راه های پیش گیری از خسارات ناشی از آن توسط متخصصین ذی ربط، مهم ترین اقدام برای کاهش خسارت های ناشی از آن است.

کلید واژگان: لغزش زمین، ارزیابی حساسیت لغزش زمین، الگوریتم جنگل تصادفی، حوزه آبخیز گلندرود

Assessment of landslide sensitivity and determination of effective factors in its occurrence using the random forest algorithm(Case study: Glandrood watershed)

Ali Gilanipoor, Sadroddin Motevalli *, Khabat Derafshi

Journal of Geography and Environmental Hazards, Volume:14 Issue: 53, Spring 2025, PP 247 -274

The Glandrood watershed, given its geological, tectonic, climatic, hydrological characteristics, topography, and poor vegetation cover, has a landslide potential, and inappropriate human intervention in it leads to the occurrence and intensification of mass movements. In the present study, using a descriptive-analytical and survey approach, an attempt has been made to prepare a sensitivity map for slope instability and landslides in the study area using 11 factors effective in causing slope instability. These factors include: slope, aspect direction, elevation, distance from the road, distance from the fault, distance from the waterway, total annual precipitation, average annual temperature, land use, geology, and slope curvature. Then, a total of 352 landslide points were identified using satellite images and field visits, of which 70% were used for model training and the remaining 30% for validation. Subsequently, the random forest algorithm was coded in the MATLAB R2020a environment to identify areas susceptible to landslides. According to the landslide hazard map in the Glandrood watershed, over 30% of the area is classified as "very high risk," 19% as "high risk," 13% as "medium risk," 19% as "low risk," and 16% of the study area is classified as "very low" landslide risk. The prioritization of effective variables indicates that the highest weight, with a criterion ranking of 0.98, is related to elevation. The analysis of the catena concept, which reflects the relationship between soil patterns and landscape slopes with topography and leads to variability in soil properties and subsequently changes in vegetation cover, can well justify the relationship or influence of the elevation factor on landslide movements in the study area.

Keywords: Landslide, Susceptibility Assessment, Random Forest Algorithm, Glandrood Watershed

Identification and Control of Credit Risk in Banks Utilizing New Supervisory Technologies with Neural Network Algorithm and Random Forest Algorithm

Maryam Mashrooti, Ali Mohammadi*, Mehdi Mohammadi

AI and Tech in Behavioral and Social Sciences, Volume:3 Issue: 1, Winter 2025, PP 65 -73

The purpose of this study is to identify and control credit risk in banks utilizing new supervisory technologies with the neural network algorithm and the random forest algorithm. This research, in terms of its nature and objective, is categorized as theoretical and applied research. Given the quantitative nature of the study and the use of data mining for customer credit scoring, this investigation is data-driven. The primary foundation of this research is the discovery of knowledge from banking databases. In this study, real customers who received credit facilities from Tejarat Bank and Saman Bank in Tehran over a one-year period, whether they returned the loans to the bank or not, were defined as the statistical population. Consequently, for sampling, all individual credit customers of the selected branches of these banks during the specified time frame were examined. Out of 500 credit customers, a simple random sampling method was employed to select those who had received loans during this period, resulting in the selection of 230 samples for this study. After collecting the previous bank customer data from the relevant database and cleaning the data, the influential variables in customer ranking were identified by reviewing previous scientific research. In the next phase, using the neural network algorithm and the random forest algorithm, and with the help of relevant software, customers were classified based on their characteristics, and their behavior was predicted. The findings indicated that the random forest algorithm was more efficient in predicting customer credit risk. Statistical test results showed that the support vector machine model had higher accuracy in predicting customer credit risk. The random forest (DT) algorithm used in this research had the highest accuracy among all models, and with feature selection, the model's accuracy increased compared to the base model, achieving the highest accuracy (81.49%) among all techniques

Keywords: Credit Risk, Bank, Credit Facilities, Neural Network Algorithm, Random Forest Algorithm

راهکارهای کاهش آسیب پذیری شریان های حیاتی شهرها در برابر سیل در شهر بجنورد

مهدی مباشری*، غلامرضا میری، زهرا شریفی نیا

فصلنامه تحقیقات جغرافیایی، سال سی و نهم شماره 2 (پیاپی 152، بهار 1403)، صص 117 -127

اهداف

شریان های حیاتی یا همان زیرساخت ها جزء بنیان های اصلی و چارچوب های پایه ای هر جامعه به شمار می آیند که دربرگیرنده تمامی تاسیسات و تسهیلات مورد نیاز آن جامعه اند و اگر منقطع یا تخریب شوند بر سلامتی، ایمنی، امنیت و اقتصاد جامعه تاثیر جدی خواهند گذاشت. هدف از این پژوهش بررسی میزان خطر سیل و ارزیابی شریان های حیاتی شهر بجنورد با استفاده از الگوریتم جنگل های تصادفی بود.

روش شناسی:

این پژوهش از نوع کاربردی بوده در سال 1402 در شهر بجنورد به انجام رسید. برای ارایه راهکارهای کاهش آسیب پذیری شریان های حیاتی شهر در برابر سیل از الگوریتم جنگل های تصادفی استفاده شد. پس از بررسی های انجام شده 100 نقطه سیل گیر و 100 نقشه بدون سیل شناسایی شده و از 14 عامل موثر در وقوع سیلاب شامل ارتفاع، شیب، جهت، بارش، زمین شناسی، تراکم آبراهه، تراکم مسکونی، تراکم جمعیت، مسیل ها، فاصله از دشت های سیلابی، کاربری اراضی، شاخص پوشش گیاهی، شاخص انحنای زمین و شاخص رطوبت توپوگرافی استفاده شد. برای محاسبه درجه اهمیت هر شاخص از شاخص نسبت افزایش اطلاعات استفاده شد.

یافته ها

ارتفاع، بارش و کاربری اراضی اصلی ترین عوامل تاثیرگذار بر وقوع سیل در شهر بجنورد بودند. همچنین 676 هکتار از منطقه مورد مطالعه در وضعیت بیشترین خطر سیل و 852 هکتار نیز در معرض کمترین خطر بودند. مساحت قابل توجهی از مناطق مسکونی دارای ریسک بالایی در مقابل سیلاب بودند.

نتیجه گیری

شبکه های خیابانی متصل، انتخاب دقیق مصالح سطح خیابان، قرارگیری ساختمان های استراتژیک، اندازه بلوک های کوتاه تر و الگوی شبکه ای با تقاطع های مکرر در کاهش تاثیر سیل حایز اهمیت هستند.

کلید واژگان: سیلاب، شریان های حیاتی، الگوریتم جنگل های تصادفی، شهر بجنورد

Strategies for Mitigating Vulnerability of Critical Urban Arteries to Flooding within Bojnurd City

M. Mobasheri*, Gh.R. Miri, Z. Sharifinia

Geographical Research, Volume:39 Issue: 2, 2024, PP 117 -127

Aims

The fundamental pillars and structures crucial to any community are the vital arteries or infrastructures, encompassing all essential facilities and utilities. If these are disrupted or damaged, it will have a significant impact on the health, safety, security, and economy of the society. The objective of this study was to evaluate the flood risk and analyze the critical arteries of Bojnurd city.

Methodology

A research of applied nature was carried out in 2022 within Bojnurd city. The Random Forest algorithm was employed to propose strategies for reducing the susceptibility of city arteries to floods. Through the research, 100 locations prone to flooding and 100 maps free from flooding were identified. Fourteen factors influencing flooding were considered, such as elevation, slope, direction, precipitation, geology, river density, population density, residential density, distance from floodplains, land use, vegetation cover index, topographic land slope index, and moisture index. The significance of each factor was determined by calculating the information gain ratio.

Funding:

The variables of elevation, precipitation, and land utilization exerted a notable influence on the occurrence of flooding within the urban area of Bojnurd. Moreover, an analysis encompassing an area of 676 hectares pinpointed the regions characterized by the most elevated susceptibility to floods, while 852 hectares were recognized as exhibiting the least vulnerability. It was observed that a considerable portion of the residential zones face a heightened likelihood of being affected by flood events.

Conclusion

The interconnected nature of street networks, meticulous consideration of street surfacing materials, strategic positioning of buildings, smaller block dimensions, and the implementation of a grid layout featuring regular intersections play a crucial role in mitigating the effects of flooding.

Keywords: Flood, Vital Arteries, Random Forest Algorithm, Bojnurd City

Optimization of College English Dynamic Multimodal Model Teaching Based on Deep Learning

Yanli Zhao*, Nur Ainil Binti Sulaiman

Since 2010, deep learning has been further developed, and the concept of multi-modality has penetrated into all walks of life. However, it has not been fully researched and applied in college English teaching, so this study modeled and practiced the multimodal teaching method of college English under the deep learning mode and its application. The definitions of modality and medium are first introduced, and then the definition of multimodality in this study is clarified. Then the classification of multimodal transport is expounded. The random forest algorithm is chosen as the main algorithm of this research, and a dynamic multimodal model is established. After that, there was a collaboration with a university and sophomore students were selected for practice. After processing and analyzing the collected data, it was found that in the data sample of 268 students, the number of students who did not study independently accounted for 24%, which indicates that most college students lack interest in learning English. Preliminary tests were also conducted on students' English proficiency throughout the year, and the results showed that the students' English proficiency was at a pass level and the overall English proficiency was weak. Reassessment of students' English proficiency showed that the actual teaching effect of each English proficiency was greater than 85%, and the effectiveness of English teaching in the selected universities was significantly improved. The average score improved by 8 points, indicating that multimodal teaching is scientifically effectiveAfter a semester of multimodal teaching, the English teaching effectiveness of the university selected in this article has significantly improved. The research results indicate that the development of deep computer learning has introduced multimodal concepts into the teaching field, which is very suitable for assisting language learning based on its own advantages.After a semester of multimodal teaching, the English teaching effectiveness of the university selected in this article has significantly improved. The research results indicate that the development of deep computer learning has introduced multimodal concepts into the teaching field, which is very suitable for assisting language learning based on its own advantages.

Keywords: deep learning, multimodal theory, random forest algorithm, dynamic multimodal model

واکاوی عوامل موثر بر آسیب پذیری کالبدی سکونتگاه های غیر رسمی (مطالعه موردی: محله اسدآبادی شهرخرم آباد)

حامد عباسی*

فصلنامه برنامه ریزی توسعه کالبدی، پیاپی 30 (تابستان 1402)، صص 1 -16

سکونتگاه های غیر رسمی، نماد ناپایداری در نظام های اقتصادی، اجتماعی و فرهنگی جامعه شهری است که پیامد آن، افزایش میزان آسیب پذیری در ابعاد مختلف، ازجمله در زمینه کالبدی است. این مناطق با انبوهی از مشکلات، کیفیت زندگی شهروندان را تحت تاثیر قرار داده و از طرفی مدیریت بر این بخش از شهر با سایر مناطق شهری متفاوت و دشوار است. بنا بر اهمیت تهدیدات سکونتگاه های غیر رسمی در مقابل مخاطرات محیطی، هدف این مقاله شناخت عوامل موثر در آسیب پذیری کالبدی و مدل سازی آن در محله اسدآبادی شهر خرم آباد است. روش تحقیق به لحاظ ماهیت و روش، تحلیلی توصیفی و به لحاظ هدف، کاربردی است. برای تجزیه وتحلیل داده ها، از مدل رگرسیون چند متغیره، مدل ماشینبردار پشتیبان (SVM-ԑ) و الگوریتم جنگل تصادفی (RF) استفاده شد. یافته های تحقیق نشان داد که محله اسدآبادی، در تمام شاخص های کالبدی در وضعیت نامناسبی قرار دارد و از منظر ساکنان محله اسدآبادی، شاخص خدمات عمومی با ضریب رگرسیونی 429/0 بیشترین اثر گذاری را بر آسیب پذیری کالبدی دارد. بر اساس معیارهای ارزیابی، مدل کرنل سیگمویید ماشین بردار از سایر روش ها نتایج بهتری را ارایه می دهد و این مدل، توانایی پیش بینی آسیب پذیری کالبدی مجتمع های سکونتی غیر رسمی در برابر مخاطرات را تا بیش از 60 درصد موارد، صحیح پیش بینی می کند.

کلید واژگان: آسیب پذیری کالبدی، سکونتگاه غیر رسمی، مدل ماشین بردار پشتیبان، الگوریتم جنگل تصادفی، خرم آباد

Analyzing the Factors Affecting the Physical Vulnerability of Informal Settlements (Case Study: Asadabadi Neighborhood, Khoramabad)

Hamed Abbasi*

Journal of Physical Development Planning, Volume:8 Issue: 30, 2023, PP 1 -16

Informal settlements are a symbol of instability in the economic, social and cultural systems of the urban community. The consequence of this instability is to increase the degree of vulnerability in various dimensions, including physical vulnerability. These areas have affected the quality of life of the citizens with their problems, and on the other hand, managing this part of the city is different and difficult compared to other urban areas. According to the importance of threats of informal settlements against environmental hazards, the purpose of this article is to study the effective factors of physical vulnerability and modeling in the Asad Abadi neighborhood of Khorramabad. The research method is analytical-descriptive. In terms of purpose, the research method is applied-developmental. For data analysis, multivariate regression model, support vector machine model (SVM-ԑ) and random forest (RF) algorithm were used. The research findings showed that Assad Abadi neighborhood is in poor condition according to all physical indicators. From the perspective of residents of Asad Abadi neighborhood, the public service index with a regression coefficient of 0.429 has the most effect on physical vulnerability. Based on the evaluation criteria, the sigmoidal kernel model of the vector machine gives better results than the other methods. This model correctly predicts the physical vulnerability of informal residential complexes to hazards in more than 60% of cases.

Keywords: Physical vulnerability, Informal settlement, Support Vector Model, Random Forest Algorithm, Khorramabad

Presenting the smart pattern of credit risk of the real banks’ customers using machine learning algorithm.

Hojjat Tajik, Ghodratollah Talebnia *, Hamid Reza Vakili Fard, Faegh Ahmadi

Advances in Mathematical Finance and Applications, Volume:8 Issue: 4, Autumn 2023, PP 1409 -1428

In the past, deciding over granting loans to bank customers in Iran would be made traditionally and based on personal judgments over the risk of repayment. However, increase in demands on banking facilities by economic enterprises and families on the one side, and increased as well as extended commercial competitions among banks and financial and credit institutions in the country for reduction of facility repayment risk on the other side, have caused application of novel methods such as some statistical ones in this context. Now to predict the risk of negligence in banking facility repayment and classification of the candidates, bankers use their customers’ credit ranking. Time efficiency, cost effectiveness, avoidance from personal judgments, and further accuracy in examining the candidates who apply for various funds are of its salient merits of this new combined method. Various statistical methods including biased analysis, logistic regression, non-parametric parallelism, and also some others such as neural networks have been employed for credit ranking. In this research, given the random forest metaheuristic algorithm-based smart pattern of real bank customers’ credit risk (case study: Bank Tejarat) was presented. According to the value of skewness, the data could be stated to have a normal distribution. Based on the observed results, the lowest mean was related to the variable of type of facility and its maximum value, to the amount of facility.

Keywords: Smart pattern, bank customers’ risk, Credit Risk, Machine Learning, Random forest algorithm

پیش بینی بهینه بازده کوتاه مدت عرضه های اولیه با استفاده از الگوریتم های خفاش و جنگل تصادفی

حسین رستمخانی، بهروز خدارحمی*، آزیتا جهانشاد

نشریه دانش سرمایه گذاری، پیاپی 45 (بهار 1402)، صص 133 -158

هدف این پژوهش، پیش بینی بازده کوتاه مدت سهام در عرضه های اولیه با استفاده از الگوریتم های خفاش و جنگل تصادفی می-باشد. در این تحقیق، شرکتهایی که طی بازه زمانی1394 تا1399 برای اولین بار در فرابورس ایران عرضه شده اند به عنوان نمونه آماری انتخاب شدند. برای تجزیه وتحلیل داده ها از نرم افزار MATLAB استفاده گردید. برای آزمون فرضیه ها دو سناریو طرح گردید. سناریوی اول بصورت سالانه و سناریوی دوم بصورت6ساله در نظر گرفته شد. داد ه های مالی با 11 عامل: بازده کوتاه مدت بازار، بازده کوتاه مدت سهام جدید، تمایلات بازار، سن شرکت، اندازه شرکت، فروش سالانه ، بازده دارایی، بازده حقوق صاحبان سهام، قیمت انتشار سهام عرضه اولیه، سود عملیاتی، گردش نقدی از عملیات به عنوان عوامل تاثیرگذار و بازده مازاد سهم عرضه شده نسبت به بازار عامل تاثیرپذیر به عنوان پیش فرض های ورودی برای پیش بینی مقدار بهینه، وارد الگوریتم ها شدند. نتایج بدست آمده از الگوریتم خفاش حاکی از آن است که الگوریتم خفاش توانسته در هردو سناریو عملکرد بهتری در پیش بینی بازده کوتاه مدت سهام در عرضه های اولیه ارایه دهد و تفاوت چندانی ندارد. درحالی که نتایج دقت در پیش بینی الگوریتم جنگل تصادفی در سناریوی دوم به نسبت سناریوی اول حدود12 درصد افزایش یافته است. دلیل این تفاوت می تواند ناشی از بزرگ بودن فضای جستجو و کوتاه بودن طول زمانی داده ها برای الگوریتم جنگل تصادفی عنوان نمود. می توان نتیجه گرفت بکارگیری الگوریتم های نوپای خفاش وجنگل تصادفی در پیش بینی بازده کوتاه مدت سهام در عرضه های اولیه می تواند سرمایه گذاران را در پیش بینی بازده حداکثری و انتخاب بهنرین سهام براساس الگویی دقیق و با دقت بالا یاری نماید.

کلید واژگان: بازده کوتاه مدت، سهام عرضه اولیه، الگورتیم خفاش، الگوریتم جنگل تصادفی

Optimal short-term prediction of initial supply yields using bat and random forest algorithms

Hosein Rostamkhani, Behroz Khodarahmi *, Azita Jahanshad

Journal of Investment Knowledge, Volume:12 Issue: 45, 2023, PP 133 -158

The purpose of this study is to predict short-term stock returns in initial public offerings using random bat and forest algorithms. In this study, companies that were listed on the OTC market of Iran for the first time during the period 1394 to 1399 were selected as a statistical sample. MATLAB software was used to analyze the data. Two scenarios were proposed to test the hypotheses. The first scenario was considered as annual and the second scenario as 6 years. Financial data with 11 factors: short-term market return, short-term return on new stock, market trends, company age, company size, annual sales, return on assets, return on equity, initial public offering price, operating profit, Cash flow from operations as influential factors and excess return of the offered share relative to the influential operating market entered the algorithms as input assumptions to predict the optimal amount. The results obtained from the bat algorithm indicate that the bat algorithm was able to provide better performance in predicting short-term stock returns in initial public offering in both scenarios and is not much different. While the results of accuracy in predicting the random forest algorithm in the second scenario compared to the first scenario has increased by about 12%. It can be concluded that the use of emerging bat and jungle algorithms in predicting short-term returns can help investors in predicting maximum returns and selecting the best stocks based on a precise and accurate pattern.

Keywords: Short-term returns, initial public offering stock, Bat algorithm, Random Forest Algorithm

The Prediction of Low and High-Risk Zones of Tehran during COVID-19 by Using the Random Forest Algorithm

Najmeh Neysani Samani, Mehdi Farrokh Anari

The International Journal of Humanities, Volume:29 Issue: 4, 2022, PP 23 -35

The Coronavirus disease (Covid-19) is one of the infectious and contagious ones called 2019-nCoV acute respiratory disease. Its outbreak was first reported on December 31, 2019, in the Chinese city of Wuhan that quickly spread throughout the country within a few weeks and spread to several other countries, including Italy, the United States, and Germany, within a month. This disease was officially reported in Iran on February 19, 2020. It is important to detect and analyze high risk zones and establish regulations according to the data and the analyses of Geographic Information System (GIS) in epidemiological situations. Meanwhile, the GIS, with its location nature, can be effective in preventing the breakdown of Covid-19 by displaying and analyzing the dangerous zones where people infected with the disease. In fact, recognizing regions based on the risk of getting the disease can influence social restriction policies and urban movement rules in order to prepare daily and weekly plans in different urban regions. In this applied and analytical research, high and low risk zones of Tehran have been identified by using the random forest algorithm which is used for both classification and regression. The algorithm builds decision trees on data samples and then predicts data from each of them, and finally chooses the best solution. In this research, 7 effective criteria have been used in the level of risk of regions toward Covid-19 virus, which is: subway paths and bus for rapid transits, hospitals, administrative and commercial complexes, passageways, population densities and urban traffic. After providing the map of high-risk zones of Covid-19, the Receiver Operating Characteristic curve (ROC) has been used for evaluation. The area under the curve (AUC) obtained from ROC shows an accuracy of 98.8%, which means the high accuracy of this algorithm in predicting high and low zones toward getting the Covid-19 disease.

Keywords: Covid-19, Location Analysis, Random Forest Algorithm, Epidemiology

بهینه سازی آشفتگی اسامی نویسندگان مقالات فارسی با استفاده از روش جنگل تصادفی

نیلوفر مظفری*، نرجس ورع

پژوهشنامه علم سنجی، پیاپی 16 (پاییز و زمستان 1401)، صص 203 -220

هدف

ارایه چارچوبی جهت حل مشکل آشفتگی و پراکندگی اسامی نویسندگان در مقالات فارسی که منجر به گسیختگی و فقدان جامعیت در بازیابی اطلاعات شده است.

روش شناسی:

پژوهش حاضر از نوع کاربردی علم سنجی است که به روش اسنادی انجام شده است. جامعه آماری را از 913 رکورد از نام نویسندگان مقالات فارسی برگرفته از پایگاه استنادی علوم جهان اسلام، طی بازه زمانی 1395 تا 1397 تشکیل می دهد. چارچوب پیشنهادی از سه مرحله جستجو، تطابق و گروه بندی تشکیل شده است. در این راستا، بعد از پیش پردازش اولیه و استخراج ویژگی، عملیات جستجو با هدف یافتن رکوردهایی که بالقوه احتمال یکسان بودن آنها وجود دارد انجام شده و سپس رکوردهای یکسان از طریق بررسی های بیشتر در مرحله تطابق که مبتنی بر جنگل تصادفی است یافت می شود.

یافته ها:

ویژگی های پست الکترونیک، نام خانوادگی و نام از مهم ترین ویژگی ها برای بهینه سازی آشفتگی نگارش اسامی هستند. استفاده از جنگل تصادفی به عنوان طبقه بند در مرحله تطابق، با دقت بالای 99 درصد می تواند مشکل آشفتگی نگارش اسامی نویسندگان را برطرف نماید.

نتیجه گیری:

نتایج نشان از کارایی بالای این روش در یکدست سازی اسامی با توجه به معیارهای دقت، بازیافت و مقدار اف نسبت به طبقه بندهای بردار پشتیبان، نزدیک ترین همسایه و ژنتیک دارد.

کلید واژگان: آشفتگی نگارش، جنگل تصادفی، نویسندگان مقالات فارسی، مستندسازی نام ها، الگوریتم ساندکس

Optimizing Confusion of Authors’ Names in Persian Articles Using Random Forest Algorithm

Niloofar Mozafari *, Narjes Vara

Scientometric research journal, Volume:8 Issue: 16, 2022, PP 203 -220

Purpose

Name is a key factor for distinguishing authors. In the academic databases that store information on papers, searching for the name of the article author is one of the most important elements in increasing visibility and the quantitative studies in the field of Scientology including the amount of citing works. The diversity of writings is one of the issues that lead to challenges in various scientific fields. In addition, the lack of writing standards in the Persian language and the lack of keyboards and standard codes, the habit of simply writing are among the factors that lead to the author's name disambiguation. Also, the spelling mistakes that occur by the writers in writing the name lead to the creation of different forms of writing for a single name. Considering the importance of solving the confusion of authors’ names in Persian articles, this paper aims to propose a framework to solve the problem of confusion and dispersion of authors' names in Persian articles, which has led to a rupture and lack of comprehensiveness in information retrieval.

Methodology

The present research is an applied scientometrics method carried out by documentary procedure, and the required data is collected from the ISC database. The initial statistical population is 913 records during the period 2015 to 2017. The proposed framework consists of three stages: searching, matching, and grouping. In this regard, after initial pre-processing and feature extraction, the search operation is performed to find records that are potentially likely to be identical. Our method extracts two types of features including internal and external. The internal feature has been extracted from the author’s information like first name, last name, affiliation, email, and co-authors. In addition, the external feature uses the scientific history of authors like articles and research interests. Next, in the search phase, the records that are potentially the same are identified. We propose a new method called Farsi-Soundex, which has been inspired by the well-known Soundex to categorize potential unique names. The same records are then found through further investigation in the adaptation phase, which is based on random forests. Therefore, the input of the matching stage is a group of records that have been detected the same based on the Farsi-Soundex algorithm. To specify whether these records are the same or not, a random forest algorithm has been applied to them. Finally, in the grouping stage, all the records that have been identified as the same using random forest are placed in one group by a hash-based algorithm.

Finding

The internal features of Email address, last name, and first name are the most significant features to optimize name-writing confusion. Also, the obtained results show the external features of the main subject and sub-subject provide the least effective features for solving the author name disambiguation problem in the academic database. In addition, using a random forest as a classifier in the matching phase, with an accuracy of over 99%, can solve the problem of confusion in writing the authors' names.

Conclusion

Results show the high efficiency of our framework in uniformity of names according to the criteria of accuracy, recall, and F value compared to the support vector machine, the nearest neighbor, and genetics. Our proposed method can be applied to scientific databases to standardize the names of the authors. In the future, we are investigating the efficiency of our proposed framework in a non-stationary environment in which the distribution of data may be changed over time.

Keywords: Name ambiguity, Article authors Persian articles, Random forest algorithm, Name Authority, Farsi-Soundex algorithm

انتخاب بهینه سهام با استفاده از الگوریتم خفاش و جنگل تصادفی

حسین رستمخانی، بهروز خدارحمی*، آزیتا جهانشاد

نشریه مهندسی مالی و مدیریت اوراق بهادار، پیاپی 48 (پاییز 1400)، صص 461 -480

هدف این تحقیق انتخاب بهینه سهام با استفاده از الگوریتم خفاش و جنگل تصادفی است. در این پژوهش براساس تحلیل6 متغیر: نسبت قیمت سهام بر سود هر سهم، نرخ رشد سود سالانه، نرخ رشد فروش سالانه، بازده دارایی ها، بازده حقوق صاحبان سهام و سهام شناور آزاد استخراج شده از 181 شرکت پذیرفته شده بورس اوراق بهادار تهران، در طی دوره زمانی 1394 تا 1398 استفاده شده است. 6 سناریو به منظور برآورد دقت دو الگوریتم در نظر گرفته شده است به طوریکه برای سناریوهای 1 تا 6 از الگوریتم ها خواسته شده است تا به ترتیب 5، 10، 15، 20، 25 و 30 شرکت پیدا کند. نتایج نشان می دهد که ماهیت الگوریتم جنگل تصادفی نیاز به آموزش و انتخاب ویژگی ها دارد که باعث می شود سرعت الگوریتم پایین تر باشد و زمان همگرایی را بالا می برد. یکی از علت اساسی دقت بالا تر الگوریتم جنگل تصادفی در سناریوهای 1 تا 3 این مورد می تواند باشد. در سناریوها 4 تا 6 به علت افزایش پیچیدگی مساله دقت الگوریتم جنگل تصادفی کاهش پیدا می کند ولی به دلیل ماهیت تصادفی بودن الگوریتم خفاش دقت آن تفاوت چندانی ندارد و می تواند پایداری در انتخاب خود را حفظ نماید.

کلید واژگان: انتخاب سهام، الگورتیم خفاش، الگوریتم جنگل تصادفی

Optimal stock selection using bat and random forest algorithm

Hosein Rostamkhani, Behroz Khodarahmi *, Azita Jahanshad

Financial Engineering and Protfolio Management, Volume:12 Issue: 48, 2021, PP 461 -480

The purpose of this study is to optimally select stocks using the bat and random forest algorithm. In this study, based on the analysis of 6 variables: stock price to earnings per share ratio, annual earnings growth rate, annual sales growth rate, return on assets, return on equity and free float shares extracted from 181 companies listed on the Tehran Stock Exchange, It has been used during the period of 1394 to 1398. Six scenarios are considered to estimate the accuracy of the two algorithms, so that for scenarios 1 to 6, the algorithms are asked to participate 5, 10, 15, 20, 25 and 30, respectively. The results show that the nature of the random forest algorithm requires training and selection of features, which makes the algorithm faster and increases the convergence time. One of the main reasons for the higher accuracy of the random forest algorithm in scenarios 1 to 3 could be this. In scenarios 4 to 6, due to the increasing complexity of the problem, the accuracy of the random forest algorithm decreases, but due to the random nature of the bat algorithm, its accuracy does not differ much and it can maintain stability in its selection.

Keywords: Stock Selection, Bat algorithm, Random forest algorithm

تحلیل مکانی خطر زمین لغزش با تاکید بر عوامل ژئومورفولوژیک با استفاده از مدل جنگل تصادفی (مطالعه موردی: شهرستان لارستان در استان فارس)

محمدابراهیم عفیفی*

نشریه جغرافیای طبیعی، پیاپی 51 (بهار 1400)، صص 39 -53

با توجه به توانایی تکنیک های داده کاوی، کاربرد آن ها در رشته های علوم زمین گسترش فراوانی داشته است. هدف از پژوهش حاضر پهنه بندی حساسیت زمین لغزش با استفاده از الگوریتم جنگل تصادفی، در شهرستان لارستان، استان فارس است. جنگل های تصادفی یک نوع مدرن از درخت - پایه هستند که شامل انبوهی از درخت های کلاس بندی و رگرسیونی می باشند. الگوریتم جنگل تصادفی مبتنی بر دسته ای از درخت های تصمیم است و در حال حاضر یکی از بهترین الگوریتم های یادگیری است. برای انجام پژوهش حاضر لایه های اطلاعاتی درجه شیب، جهت شیب، ارتفاع از سطح دریا، شکل شیب، فاصله از گسل، فاصله از آبراهه، فاصله از جاده، بارندگی، لیتولوژی و کاربری اراضی به عنوان عوامل موثر بر وقوع زمین لغزش شناسایی و نقشه های آن در نرم افزار ArcGIS10/2 رقومی و تهیه شدند. سپس با استفاده از الگوریتم جنگل تصادفی، ارتباط بین عوامل موثر و موقعیت زمین لغزش هاو وزن هر یک از آن ها در نرم افزار آماری R محاسبه و در نهایت جهت تهیه نقشه حساسیت زمین لغزش منطقه مورد مطالعه به محیط GIS منتقل گردید. نتایج ارزیابی دقت روش پهنه بندی با استفاده از منحنی تشخیص عملکرد نسبی و 30 درصد نقاط لغزشی استفاده نشده در فرآیند مدل سازی، بیانگر دقت عالی مدل جنگل تصادفی با سطح زیر منحنی 8/98 درصد است. توصیه اجرایی جهت کاهش خطر پایدارسازی مناطق ناپایدار و دوری جستن از این مناطق است؛ و هرگونه برنامه ریزی در توسعه آتی عناصر کالبدی زیرساختی شهری باید با توجه به احتمال سانحه زمین لغزش صورت گیرد.

کلید واژگان: الگوریتم جنگل تصادفی، لارستان، زمین لغزش، منحنی راک

Spatial analysis of landslide risk with emphasis on geomorphological factors using stochastic forest model (Case study: Larestan city in Fars province)

Mohammad Ibrahim Afifi *

Journal of Physical Geography, Volume:14 Issue: 51, 2021, PP 39 -53

Due to the ability of data mining techniques, their application in the field of earth sciences has been widely developed. The purpose of this study is to zoning landslide sensitivity using stochastic forest algorithm in Larestan city, Fars province. Random forests are a modern type of tree-base that includes a host of classification and regression trees. The random forest algorithm is based on a bunch of decision trees and is currently one of the best learning algorithms. For the present study, information layers of slope degree, slope direction, altitude, slope shape, distance from fault, distance from waterway, distance from road, rainfall, lithology and land use as factors affecting landslide occurrence were identified and its maps in software. ArcGIS10 / 2 digit and were prepared. Then, using a random forest algorithm, the relationship between the effective factors and the location of landslides and the weight of each of them were calculated in R statistical software and finally transferred to the GIS environment to prepare a landslide susceptibility map. The results of evaluating the accuracy of the zoning method using the relative yield detection curve and 30% of the slip points not used in the modeling process, indicate the excellent accuracy of the random forest model with the area below the curve being 98.8%. The executive recommendation is to reduce the risk of stabilization of unstable areas and to avoid these areas; And any planning in the future development of the physical elements of urban infrastructure should be done with regard to the possibility of landslides.

Keywords: Random forest algorithm, Larestan, landslide, Rock curve

بررسی الگوی مصرف آب خانوارهای روستایی و شهری ایران با استفاده از الگوریتم جنگل تصادفی

ندا بیات *، علی اصغر سالم

فصلنامه پژوهش های اقتصادی ایران، پیاپی 84 (پاییز 1399)، صص 69 -114

وضعیت بحرانی آب و رشد تقاضای آن در ایران، تخریب و بهره برداری بی رویه از منابع آب های زیرزمینی و تداوم خشکسالی در سال های گذشته، مدیریت تقاضای آب را در تمامی بخش های مصرف از جمله بخش خانگی به یک دغدغه مهم در سیاست های کشور تبدیل کرده است. بدون شناسایی عوامل تاثیرگذار و اهمیت آن ها در الگوی مصرف آب خانگی، برنامه ریزی و اجرای سیاست های کارآمد امکان پذیر نیست. در همین راستا، این پژوهش درصدد شناسایی عوامل اقتصادی- اجتماعی موثر بر تقاضای آب و تعیین درجه اهمیت آن ها در الگوی مصرف آب خانوار برآمده است. با توجه به عدم امکان تصریح روابط ریاضی دقیق میان این عوامل و مصرف آب در این پژوهش از الگوریتم جنگل تصادفی برای شناسایی عوامل موثر استفاده شده است. همچنین با هدف کارایی بیشتر به دلیل تفاوت های موجود در سبک زندگی و فرهنگی خانوارهای روستایی و شهری، این دو گروه به تفکیک مورد بررسی قرار گرفته اند. در نهایت 17 عامل موثر در سه سطح متفاوت تاثیرگذاری شناسایی شدند که از این میان متغیرهای درآمد خانوار، مساحت خانه و سن سرپرست واستفاده از کولرهای آبی مهم ترین متغیرهای کمی و کیفی اثرگذار بر مصرف خانوارهای شهری و روستایی شناسایی شدند. سایر عوامل نیز به ترتیب اثرگذاری رتبه بندی شدند.

کلید واژگان: الگوی مصرف آب، الگوریتم جنگل تصادفی، عوامل اقتصادی- اجتماعی

Modeling the Pattern of Water Consumption Expenditures in Rural and Urban Households in Iran using Random Forest Algorithm

Neda Bayat *, Ali Asghar Salem

Iranian Journal of Economic Research, Volume:25 Issue: 84, 2021, PP 69 -114

The critical situation of water and its demand growth in Iran as well as the destruction and improper exploitation of groundwater resources with drought continuity have caused the management of water demand to become an important concern in the country's policies in all sectors, including the household sector. Without identifying the influential factors and their importance in the pattern of household water consumption, planning and implementation of effective policies are impossible. This study identifies socio-economic factors affecting water demand and determine their degree of importance in the pattern of household water consumption. Due to the impossibility of specifying the accurate mathematical relationships between these factors and water consumption, this study used the random forest algorithm to determine the most important factors. Also, because of the differences in the lifestyle and cultures of rural and urban households, these two groups have been studied separately. Finally, according to the obtained results, 17 important factors were determined in three different levels of influence. The variables of household income, house size, and age of the household’s head were identified as the three most important quantitative variables affecting both urban and rural household consumption, respectively. Among the qualitative variables, the use of evaporative cooler was recognized as the most important effective variable. Other factors were ranked in order of effectiveness.

Keywords: Water Consumption, Random Forest Algorithm, Socio-Economic Factors

اولویت بندی عوامل موثر بر وقوع حرکات دامنه ای و تهیه نقشه پهنه ی خطر وقوع آن با استفاده از الگوریتم نوین جنگل تصادفی(مطالعه موردی: بخشی از حوضه آبریز سد لتیان)

لیلا ابراهیمی*

نشریه جغرافیای طبیعی، پیاپی 49 (پاییز 1399)، صص 125 -143

اولین گام مهم و اساسی در ارزیابی خطر حرکات دامنه ای تهیه نقشه های خطر وقوع حرکات دامنه ای است، این نقشه ها به عنوان یک محصول نهایی است که می تواند برای برنامه ریزی کاربری اراضی مفید واقع شود. هدف از این پژوهش اولویت بندی عوامل موثر در وقوع حرکات دامنه ای و پهنه بندی خطر وقوع آن با استفاده از الگوریتم نوین جنگل تصادفی در نقشه توپوگرافی 1:50000 لشکرک در قسمت شمالی سد لتیان می باشد. با توجه به ویژگی های هیدرولوژیکی، توپوگرافی، زمین شناسی، ژیومورفولوژیکی و اقلیمی منطقه مهمترین عوامل موثر در وقوع حرکات دامنه ای 9 شاخص فاصله از جاده،گسل، رودخانه، لیتولوژی، میزان بارش، ارتفاع، شیب، جهت شیب و کاربری اراضی به عنوان مهم ترین عوامل موثر در وقوع این نوع حرکات در منطقه مد نظر قرار گرفت.نتایج تحقیق نشان می دهد سه عامل فاصله از گسل، جاده و میزان شیب به ترتیب جزء سه عامل مهم در وقوع حرکات دامنه ای در منطقه مورد مطالعه می باشد. به منظور ارزیابی مدل تهیه شده، از منحنی تشخیص عملکرد نسبی استفاده گردید. براساس نتایج منحنی راک، مقدار سطح زیر منحنی در نقاط آموزشی برابر با 826/0 و در نقاط ارزیابی برابر با 839/0 درصد برآورد گردیده است. که شکل شماره 9 ارزیابی خیلی خوب الگوریتم جنگل تصادفی در پهنه بندی خطر وقوع حرکات دامنه ای با استفاده از این مدل است.

کلید واژگان: حرکات دامنه ای، الگوریتم جنگل تصادفی، منحنی راک

Prioritization of factors affecting the occurrence of slope movements and preparation of a zoning map of the risk of its occurrence using a new random forest algorithm (Case study: part of the catchment area of Latian Dam)

Leila Ebrahimi *

Journal of Physical Geography, Volume:13 Issue: 49, 2021, PP 125 -143

The first and most important step in assessing the risk of sloping movements is to prepare risk maps for the occurrence of sloping movements. These maps are a final product that can be useful for land use planning. The purpose of this study is to prioritize the factors affecting the occurrence of slope movements and zoning the risk of its occurrence using a new random forest algorithm in the topographic map of 1: 50,000 Lashkark in the northern part of Latian Dam. According to the hydrological, topographic, geological, geomorphological and climatic characteristics of the region, the most important factors influencing the occurrence of slope movements are 9 indices of distance from road, fault, river, lithology, rainfall, altitude, slope, slope direction and land use as the most important. Factors affecting the occurrence of this type of movement in the region were considered. The results show that the three factors of distance from the fault, road and slope are three important factors in the occurrence of amplitude movements in the study area, respectively. In order to evaluate the prepared model, the relative performance detection curve was used. Based on the results of the rock curve, the value of the area under the curve in educational points is equal to 0.826 and in evaluation points is equal to 0.839 percent. Figure 9 is a very good evaluation of the stochastic forest algorithm in zoning the risk of slope movements using this model.

Keywords: Slope Movements, Random forest algorithm, Rock curve

مدلسازی رابطه بین سرزندگی شهری و حس تعلق مکانی در شهر قاین

احمد اسدی*، مهدی مودودی، سعید حسین آبادی

فصلنامه پژوهش و برنامه ریزی شهری، پیاپی 40 (بهار 1399)، صص 17 -30

سرزندگی یکی از معیارهای اصلی شهرهایی است که از کیفیت بالای برنامه ریزی و طراحی برخوردارند. محیط های شهری سرزنده، زمینه ساز تعاملات اجتماعی، خلق و افزایش سرمایه اجتماعی و حس تعلق به مکان می گردند. هدف این مطالعه تحلیل رابطه سرزندگی و حس تعلق مکانی در شهر قاین است. این تحقیق جزو تحقیقات پیمایشی و ابزار جمع آوری داده ها، پرسشنامه می باشد. حجم نمونه آماری 382 نفر از شهروندان شهر قاین می باشد که با استفاده از فرمول کوکران انتخاب شده است. متغیرهای مستقل این تحقیق شاخص های سرزندگی شهری (دسترسی، پویایی اجتماعی، سرزندگی اقتصادی، امنیت شهری، طراحی معابر، مبلمان، روشنایی فضاهای شهری، سیما و منظر شهری و خوانایی) و متغیر وابسته، حس تعلق مکانی می باشد. برای تحلیل رابطه متغیرها از مدل های k- نزدیک ترین فاصله و الگوریتم جنگل های تصادفی استفاده شده است. شاخص سرزندگی از 28 گویه تشکیل شده است. یافته های تحقیق نشان می دهد که میانگین 17 گویه آن در سطح مناسبی قرار ندارد و 11 گویه آن در سطح قابل قبولی هستند. در کل سرزندگی شهری در شهر قاین پایین است. همچنین میانگین متغیر حس تعلق مکانی در سطح متوسطی قرار دارد. با توجه به خروجی مدل ها روش K نزدیک ترین همسایه نتایج بهتری از مدل جنگل تصادفی داشته است. عملکرد مدل K-NN گویای آن است که این مدل تاثیر سرزندگی بر تعلق مکانی را با ضریب همبستگی 0/82و میزان خطای 66/0 و ریسک برآورد 0/43 شبیه سازی کرده است. بر اساس مدل ایجاد شده توسط الگوریتم جنگل تصادفی متغیر دسترسی بیشترین و خوانایی کمترین تاثیر را در حس تعلق در شهر قاین را دارند.

کلید واژگان: سرزندگی شهری، حس تعلق مکانی، K نزدیک ترین همسایه، الگوریتم جنگل تصادفی، شهر قائن

The relation between urban vitality and sense of place attachment (Case study: Qaen city)

Ahmad Asadi *, Mehdi Mododi, Saeed Hossein Abadi

Research and Urban Planning, Volume:11 Issue: 40, 2020, PP 17 -30

The vitality is one of the main criteria for cities with high quality planning and design. Lively urban environments create social interactions, increase social capital and place attachment. The aim of this paper is to analyze the relationship between urban vitality and Place attachment in Qaen city. This study is a survey research and data collection tool is a questionnaire. The sample size is 382 citizens of Qaen city, which is calculated using the Cochran formula. Independent variables of this research are urban vitality indicators (access, social dynamics, economic vitality, urban security, street design, furniture, urban spaces lighting, urban landscape, and readability) and dependent variable, is sense of place attachment. For analysis of the relationship between variables, the k-Nearest Neighbors model (K-NN) and Randomized Forest Algorithm (RF)have been used. The vitality indicator consists of 28 items, that averages of 17 items are not at the appropriate level and 11 of them are at acceptable level. In general, the urban vitality is low in Qaen city. Also, the mean of the sense of place is at a moderate level. According to the output of the models, the K nearest neighbor method provides better results than the random forest model. K-NN model simulated impact of vitality on place attachment with a correlation coefficient of 0.82 and an error level of 0.66 and a risk of estimating of 0.435. Therefore, this model can correctly simulate place attachment in 57% of cases. Based on the model generated by the random forest algorithm, the access variable has the most and legibility has the least effect on the sense of place attachment in the city of Qaen. Extended Abstract

Introduction

Vitality as a component of the overall quality of urban design of an environment can affect the location Belong and Improve the social health of citizens Hence the need to enhance the vitality of urban spaces and urban neighborhoods emerge as one of the paradigms in urban planning. The ultimate goal of urban planning is to provide suitable living conditions in the city and urban residential environments that are closely related to the concept of vitality and livability of the city. A city where people live should be inviting and create a passionate environment. In the cities of Iran, the component of vitality has not received much attention. Influenced by the dominance meaningless without regard to the background and nature of traditional neighborhoods, the conquest of material factors over the spiritual, and the dominance of machines over humans. And used at the same level as urban divisions. The continuation of such a trend has made the civic life of neighborhoods and urban spaces lack vitality. At the same time, the physical spaces of neighborhoods in Iranian historical cities were influenced by various factors, including cultural, social, natural, and so on. Increasing urban vitality, on the one hand, makes the population living in a city more willing to stay, and on the other, it also attracts a population that intends to migrate to the metropolitan areas of the country. In this research, we try to study the relationship between the vitality of urban spaces and the sense of belonging of citizens to the city of Qaen. The importance of this research is that location belongs to an important role in the development of any city; On the other hand, location affiliation reduces migration from one region to another. This is largely felt in the South Khorasan region. The increase in location Belong is influenced by various factors, one of which is urban vitality. Therefore, the purpose of this study is to model the relationship between urban vitality and sense of belonging using k-nearest distance and random forest algorithm in Qaen.

Methodology

The research method is applied in terms of descriptive-analytic nature and the method of data collection is library and field method. The population of the study is the population of Qaen with about 42323 populations (according to the year 1395). The sample size is based on Cochran method of 382 persons and the sampling method is simple random. This study investigates the relationship between vitality index and sense of belonging. The vitality index includes variables such as accessibility, social dynamics, economic vitality, urban security, passage design, furniture, urban lighting, urban landscape and readability. In this study variables of vitality index are considered as independent variables and sense of belonging as dependent variable. In this study, K-nearest neighbor and random forest were used to model the independent variables of urban vitality and the dependent variable of sense of belonging.

Results and discussion

Among the variables related to the urban vitality index, urban access had the highest score and urban furniture had the lowest score. The sense of belonging in the city of Qaen is at a moderate level, with only "this is where I want it" somewhat lower. Thus, the level of satisfaction of the citizens of Cain is at a moderate level, and none of the sense of belonging is at a very high level. In this study, two methods of K nearest neighbor and random forest were used to model the independent variable (urban vitality) and the dependent variable (sense of belonging). Performance evaluation of the K-NN model shows that the model simulated the sense of belonging with a correlation coefficient of 0.82 with a 0.66 error rate and a risk estimate of 0.435. Therefore, this model was able to correctly simulate the sense of belonging in 66% of cases. Evaluation of other statistical indices shows that this model overall simulates the sense of belonging less than the observed value. That is, the model shows a negative bias. But this bias is not much to distort the model's results. As such, the mean and standard deviation of the sense-of-belonging variable simulated by the model have very little difference with the observed values of these two statistical indicators. Evaluation of model validation criteria shows that with 200 trees in addition, the best results are obtained with the least modeling error but also the error rate is stable and achieves a consistent trend. As this model was able to accurately simulate the sense of belonging based on independent variables with error of 0.82, 0.68 in 32% of cases. Important in this modeling process is the negligible bias. t is noteworthy that with respect to residual mass factor (CRM), although the model overall has a negligible bias, the simulation values are more than the observed values. But fitting the observed and simulated sense of belonging values and comparing the mean and standard deviation of these two data sets show that the random forest model failed to estimate the upper and lower values of the sense of belonging variable with a slight error, as in the K-NN model. In general, the K-NN model exhibits much higher flexibility than the RF model in simulating the sense of belonging. According to the outputs of the models, the K-nearest neighbor method had better results than the random forest model. The performance of the K-NN model show that the model simulated the sense of belonging with a correlation coefficient of 0.82 with an error rate of 0.66 and an estimated risk of 0.3535. Therefore, this model was able to correctly simulate the sense of belonging in 66% of cases.

Conclusion

According to the model created by random forest algorithm respectively accessibility variables, social dynamics, economic vitality, urban security, passage design, furniture, urban spaces lighting, urban landscape and readability have the most to least effect on sense of belonging in Qaen city. The reason that accessibility variables have the most impact on sense of belonging is due to the existence of proper passageways in the city of Qaen, and some variables such as simplicity and urban landscape and readability that have the least impact on sense of belonging are due to their weakness in the city. The vitality index consists of 28 items with an average of 17 not good and 11 items are acceptable. Also, the sense of belonging is at a moderate level. One of the reasons for the weakness of urban vitality in Qaen is the weak economy of the city. This is also due to the lack of appropriate potentials for private sector investment.

Keywords: urban vitality, sense of place attachment, K-nearest neighbor, random forest algorithm, Qaen city

بررسی سودمندی روش انتخاب متغیر ریلیف در بهبود نتایج پیش بینی فرار مالیاتی با استفاده از داده کاوی

محمد نمازی*، محمد صادق زاده مهارلویی

مجله پژوهش های کاربردی در گزارشگری مالی، پیاپی 13 (پاییز و زمستان 1397)، صص 7 -44

پژوهش حاضر به بررسی سودمندی روش های ریلیف و داده کاوی در پیش بینی فرار مالیاتی شرکت های پذیرفته شده در بورس اوراق بهادار تهران، با استفاده از داده های حسابداری و الگوهای درخت تصمیم، در دو حالت بدون انتخاب متغیرها و با انتخاب متغیرها، می‎پردازد. جامعه آماری پژوهش حاضر کلیه شرکت‎های پذیرفته شده در بورس اوراق بهادار تهران در بازه زمانی 1384 تا 1394 است و نمونه پژوهش برابر با 1081 سال شرکت می باشد. از روش‎های آماری تحلیل واریانس یک طرفه، آزمون t-test نمونه های مستقل، الگوریتم های داده‎کاوی درخت تصمیم و روش انتخاب متغیر ریلیف برای تحلیل داده ها استفاده شد. داده های پژوهش با استفاده از نرم‎افزارهای SPSS و Weka مورد تجزیه و تحلیل آماری قرار گرفتند. نتایج حاصل از الگوریتم ریلیف نشان داد که متغیرهای نسبت سود عملیاتی به جمع دارایی ها، نسبت بازده دارایی ها و ارزش بازار شرکت برای پیش بینی فرار مالیاتی مناسب تر از سایر متغیرها هستند. همچنین، نتایج آزمون تحلیل واریانس نشان داد که تفاوت در دقت پیش‎بینی روش‎های مختلف درخت تصمیم از لحاظ آماری نیز معنادار است. افزون بر این، نتایج نشان داد در هنگام مقایسه هر یک از الگوریتم ها به تنهایی در دوحالت با و بدون مرحله انتخاب متغیر، تفاوت تنها در الگوریتم LMT معنادار بود و در سایر الگوریتم ها، اگرچه دقت نتایج بهتر شده بود، اما این دقت از لحاظ آماری معنادار نبود. به عبارت دیگر، استفاده از روش انتخاب متغیر ریلیف، در هر حالتی موجب به بهبود عملکرد الگوریتم ها نمی شود.

کلید واژگان: پیش بینی فرار مالیاتی، نسبت های مالی، الگوریتم درخت تصادفی، الگوریتم جنگل تصادفی، ریلیف، داده کاوی

Investigating the Usefulness of Relief Selection Variable Method in Improving Tax Evasion Prediction Outcomes Using Data Mining

Mohammad Namazi *, Mohammad Sadeghzadeh Maharloei

Appleid Research in Financial Reporting, Volume:7 Issue: 13, 2019, PP 7 -44

The present study examines the usefulness of Relief method and data mining in predicting tax evasion of listed companies in Tehran Stock Exchange (TSE) using accounting data and decision tree patterns in two situations: with and without the phase of selecting variables. The statistical population of this study includes all companies accepted in TSE from 2005 to 2015, and the research sample included 1.081 company-years. One-way ANOVA, independent sample t-test, decision tree algorithms, and the Relief method of selecting variables were used for data analysis. Data was analyzed using SPSS and Weka softwares. The results of Relief algorithm showed that ratio of operating profit to total assets, ratio of return on assets, and market value of company are more appropriate variables than other variables for predicting tax evasion. In addition, the results of one-way ANOVA showed that the difference in prediction accuracy of different decision tree methods is statistically significant. However, when each of these algorithms compared with other separately, both states of with and without the phase of selecting optimal variables, the results showed that only LMT algorithms results were significantly different with each other. In other algorithms, even though the results improved but this was not statistically significant. In other words, using relief method does not improve the results in all algorithms.

Keywords: Tax Evasion Prediction, Financial Ratios, Random Tree Algorithm, Random Forest Algorithm, Relif, Datamining

پیش بینی روند قیمت در بازار سهام با استفاده از الگوریتم جنگل تصادفی

الهام غلامیان، سید محمدرضا داودی *

نشریه مهندسی مالی و مدیریت اوراق بهادار، پیاپی 35 (تابستان 1397)، صص 301 -322

فعالان بورس درصدد دستیابی و به کارگیری روش هایی هستند تا بتوانند با پیش بینی آتی قیمت سهام، سود سرمایه خود را افزایش دهند .بنابراین، ضروری به نظر می رسد که روش های مناسب، صحیح و متکی به اصول علمی در تعیین قیمت آینده سهام فرآروی افراد سرمایه گذار قرار گیرد. تاکنون روش های مختلفی جهت نیل به این هدف معرفی شده اند که اغلب روش های آماری و هوش مصنوعی هستند. در پژوهش حاضر با استفاده از رویکرد جنگل تصادفی که در زمره روش های طبقه بندی هوش مصنوعی می باشد، به همراه شاخص های فنی: شاخص قدرت نسبی قیمت، استوکاستیک، حجم تعادل موازنه شده، ویلیامز R%، بازده ی روزانه و شاخص سری مک دی به دنبال پیش بینی روند قیمت در بازار سهام و مقایسه آن با روش های موجود است. نتیجه ی پژوهش بر روی داده های روزانه شاخص بورس اوراق بهادار تهران در سالهای 1393 تا 1395 نشان می دهد که دقت روش پیشنهادی در برآورد روند بازار 64 درصد می باشد و نسبت به دو روش مقایسه شده رگرسیون لجستیک و روش کاملا تصادفی از دقت بالاتری برخوردار است.

کلید واژگان: الگوریتم جنگل تصادفی، پیش بینی قیمت سهام، آنتروپی، شاخص های تکنیکی، رگرسیون لجستیک

Predicting the Direction of Stock Market Prices Using Random Forest

Elham Gholamian, Sayyed Mohammad Reza Davoodi *

Financial Engineering and Protfolio Management, Volume:9 Issue: 35, 2018, PP 301 -322

Stock market activists are the acquiring and using methods to predict future stock prices, increasing their capital gains. Therefore, it seems necessary that appropriate, correct, and scientific principles are used to determine the future price of the stock of investor stock options.stock price prediction is an important part of investment, and in most cases it is the field of research for researchers, because it ultimately leads to the choice of appropriate investment. Different methods have now been developed to achieve this goal. Have been introduced that are often statistical methods and artificial intelligence. In this research, using a randomized approach approach that is among artificial intelligence classification methods, along with technical indicators that include: power index Relative Price, Stochastic, Equilibrium Balance, Williams R%, Daily Returns, and Mac.d Series Markets, are looking for stock price trends. This model is compared with logistic regression method and completely randomized method (dice throw). The results of the research on daily data of Tehran Stock Exchange Index from 1393 to 1395 indicate that the accuracy of the proposed method in estimating market trend is 64%, which is more than two methods of logistic regressionand completely randomized method of accuracy Has a higher rate.

Keywords: Random forest algorithm, stock price prediction, entropy, technical indicators, logistic regression

بررسی امکان تهیه نقشه خطر زمین لغزش با استفاده از الگوریتم جنگل تصادفی (محدوده ی موردمطالعه: حوزه آبخیز سردارآباد، استان لرستان)

علی طالبی *، سحر گودرزی، حمیدرضا پورقاسمی

مجله مخاطرات محیط طبیعی، پیاپی 16 (تابستان 1397)، صص 45 -64

با توجه به توانایی تکنیک های داده کاوی، کاربرد آن ها در رشته های مختلف مهندسی و علوم زمین گسترش فراوانی داشته است. هدف از پژوهش حاضر پهنه بندی حساسیت زمین لغزش با استفاده از الگوریتم جنگل تصادفی، در حوزه آبخیز سردارآباد در شهرستان خرم آباد، استان لرستان است. جنگل های تصادفی یک نوع مدرن از درخت- پایه هستند که شامل انبوهی از درخت های کلاس بندی و رگرسیونی می باشند. الگوریتم جنگل تصادفی مبتنی بر دسته ای از درخت های تصمیم است و در حال حاضر یکی از بهترین الگوریتم های یادگیری است. برای انجام پژوهش حاضر لایه های اطلاعاتی درجه شیب، جهت شیب، ارتفاع از سطح دریا، شکل شیب، فاصله از گسل، فاصله از آبراهه، فاصله از جاده، بارندگی، لیتولوژی و کاربری اراضی به عنوان عوامل موثر بر وقوع زمین لغزش شناسایی و نقشه های آن در نرم افزار ArcGIS10.2 رقومی و تهیه گردیدند. سپس با استفاده از الگوریتم جنگل تصادفی، ارتباط بین عوامل موثر و موقعیت زمین لغزش ها و وزن هر یک از آن ها در نرم افزار آماری R محاسبه و درنهایت جهت تهیه نقشه حساسیت زمین لغزش منطقه موردمطالعه به محیط GIS منتقل گردید. نتایج ارزیابی دقت روش پهنه بندی با استفاده از منحنی تشخیص عملکرد نسبی و 30 درصد نقاط لغزشی استفاده نشده در فرآیند مدل سازی، بیان گر دقت عالی مدل جنگل تصادفی با سطح زیر منحنی 8/98 درصد است. هم چنین بر اساس الگوریتم جنگل تصادفی، عوامل لیتولوژی، فاصله از جاده و فاصله از رودخانه به ترتیب بیش ترین تاثیر را در وقوع زمین لغزش حوزه آبخیز سردارآباد داشته اند.
با توجه به توانایی تکنیک های داده کاوی، کاربرد آن ها در رشته های مختلف مهندسی و علوم زمین گسترش فراوانی داشته است. هدف از پژوهش حاضر پهنه بندی حساسیت زمین لغزش با استفاده از الگوریتم جنگل تصادفی، در حوزه آبخیز سردار آباد در شهرستان خرم آباد، استان لرستان است. جنگل های تصادفی یک نوع مدرن از درخت- پایه هستند که شامل انبوهی از درخت های کلاس بندیورگرسیونی می باشند. الگوریتم جنگل تصادفی مبتنی بر دسته ای از درخت های تصمیم است و در حال حاضر یکی از بهترین الگوریتم های یادگیری است. برای انجام پژوهش حاضر لایه های اطلاعاتی درجه شیب، جهت شیب، ارتفاع از سطح دریا، شکل شیب، فاصله از گسل، فاصله از آبراهه، فاصله از جاده، بارندگی، لیتولوژی و کاربری اراضی به عنوان عوامل موثر بر وقوع زمین لغزش شناسایی و نقشه های آن در نرم افزار ArcGIS10.2 رقومی و تهیه گردیدند. سپس با استفاده از الگوریتم جنگل تصادفی، ارتباط بین عوامل موثر و موقعیت زمین لغزش ها ووزن هر یک از آن ها در نرم افزار آماری R محاسبه و در نهایت جهت تهیه نقشه حساسیت زمین لغزش منطقه مورد مطالعه به محیط GIS منتقل گردید.نتایج ارزیابی دقت روش پهنه بندی با استفاده از منحنی تشخیص عملکرد نسبی و 30درصد نقاط لغزشی استفاده نشده در فرآیند مدل سازی، بیان گر دقت عالی مدل جنگل تصادفی با سطح زیر منحنی 8/98درصد است.هم چنین بر اساس الگوریتم جنگل تصادفی، عوامل لیتولوژی، فاصله از جاده و فاصله از رودخانه به ترتیب بیش ترین تاثیر را در وقوع زمین لغزش حوزه آبخیز سردارآباد داشته اند.

کلید واژگان: الگوریتم جنگل تصادفی، حوزه آبخیز سردارآباد، زمین لغزش، منحنی راک

Investigation of the possibility of landslide hazard mapping using the Random Forest algorithm (Case study: Sardarabad Watershed, Lorestan Province)

Ali Talebi *, Sahar Goudarzi, Hamid Reza Pourghsemi Pourghsemi

Journal of Natural environment hazards, Volume:7 Issue: 16, 2018, PP 45 -64

With respect to the ability of data analysis techniques, their applications in various engineering and geosciences disciplines have been expanded. In this study, the random forest algorithm has been used for landslide susceptibility mapping in the Sardarabad Watershed, Lorestan Province. Random forest is another popular and very efficient algorithm, based on model aggregation ideas, for both regression and classification problems. The method combines the idea of bagger with random feature selection. For this purpose, layers of slope, aspect, elevation, curvature, distance from the fault, distance from the river, distance from the road, rainfall, lithology and land use were prepared as the factors influencing landslide. Then, their maps were digitized in ArcGIS10.2 map-software. Then, sensitive areas to landslides were evaluated using adaptive random forest algorithms. Meanwhile, random forest algorithms were written in R software and finally, ROC curves were used for evaluating the models. Based on the obtained results in the study area, the accuracy of the random forest algorithm is 98.8%. Overall, the random forest algorithm indicates that lithology and distance to roads are the main factors on landslide occurrence. Overall, the random forest algorithm indicates that lithology and distance to roads are the main factors on landslide occurrence.

Keywords: Random Forest algorithm, watershed Sardarabad, landslides, ROC curve

Forecasting Stock Trend by Data Mining Algorithm

Sadegh Ehteshami *, Mohsen Hamidian, Zohreh Hajiha, Serveh Shokrollahi

Advances in Mathematical Finance and Applications, Volume:3 Issue: 1, Winter 2018, PP 97 -105

Stock trend forecasting is a one of the main factors in choosing the best investment, hence prediction and comparison of different firms stock trend is one method for improving investment process. Stockholders need information for forecasting firms stock trend in order to make decision about firms stock trading. In this study stock trend, forecasting performs by data mining algorithm. It should mention that this research has two hypotheses. It aimed at being practical and it is correlation methodology. The research performed in deductive reasoning. Hypotheses analyzed based on collected data from 180 firms listed in Tehran stock exchange during 2009-2015. Results indicated that algorithms are able to forecast negative stock return. However, random forest algorithm is more powerful than decision tree algorithm. In addition, stock return from last three years and selling growth are the main variables of negative stock return forecasting.

Keywords: Stock trend forecasting, Random forest algorithm, Decision tree algorithm

به جمع مشترکان مگیران بپیوندید!

random forest algorithm