hierarchical clustering
در نشریات گروه فنی و مهندسی-
امروزه پاسخ تقاضا به عنوان عنصر مهم در قابلیت اطمینان شبکه هوشمند شناخته شده است. سیستم های مدیریت انرژی خانه های هوشمند که راه اندازی وسایل برقی را با توجه به ضرورت استفاده و کارایی در اولویت قرار میدهند، نقشی حیاتی در اثربخشی استراتژی های پاسخ تقاضا دارند. شفاف سازی جزئیات مصرف برق در قبوض با وجود فناوری های حسگر به نظارت بهینه استفاده از لوازم خانگی کمک می کند. در این تحقیق، یک مدل یادگیری ماشین بدون نظارت، برای خوشه بندی لوازم خانگی به منظور مدیریت قبوض برق بر اساس ویژگی های ذاتی آنها مطرح گردید، چرا که بر این اساس میتوان جزئیات مصرف به ازای هر خوشه از لوازم خانگی را در قبض مصرفی دوره لحاظ کرد و به دلیل وجود خوشه های محدود برای لوازم خانگی امکان مدیریت و نظارت بر مصرف برق امکان پذیر میگردد. لوازم خانگی با روش خوشه بندی سلسله مراتبی به سه خوشه طبقه بندی شدند. خوشه اول لوازمی است که به صلاحدید مشتری بلافاصله روشن شوند، خوشه دوم طبق برنامه زمانبندی روشن میشوند و استفاده از آنها میتواند به تعویق افتد و خوشه سوم لوازمی هستند که توسط تعداد محدودی از مشتریان ترجیح داده میشوند. ضریب silhouette بعنوان معیاری برای ارزیابی عملکرد مدل خوشه بندی سلسله مراتبی ایجاد شد، که مقدار 0.56رضایت بخشی مدل را نشان می دهد. بر اساس نتایج، مشخص شد که روش خوشه بندی پیشنهادی می تواند با انتخاب ویژگی های مناسب، انواع مختلف لوازم خانگی را بطور منطقی طبقه بندی کند، زیرا لوازم موجود در یک خوشه شبیه به یکدیگر هستند و می توانند به کاربران در درک عملکرد لوازم خانگی کمک کنند.کلید واژگان: شبکه هوشمند، سیستم مدیریت انرژی خانگی، پاسخ تقاضا، خوشه بندی سلسه مراتبیNowadays, demand response is recognized as an important element in the reliability of smart grid. Smart home energy management systems, which prioritize the start-up of electrical appliances according to the necessity of use and efficiency, play a vital role in the effectiveness of load response strategies in residential areas. Considering the sensor technologies, clarification on electricity consumption details helps to optimally monitor how the appliances are used. In this research, an unsupervised machine learning model was proposed for the clustering of home appliances to manage the bills of customers based on their inherent characteristics. Due to the small number of clusters, it becomes possible to manage electricity consumption. The hierarchical clustering method was used to classify appliances into three clusters. The first cluster is the appliances that are turned on at the discretion of the consumers immediately, the second cluster is the appliances that can be turned on according to the schedule and their usage can be postponed and the third cluster is appliances that are preferred by a limited number of consumers. The silhouette coefficient was developed as a measure of the hierarchical clustering model performance, where the average silhouette coefficient of 0.56 indicates the satisfaction of the model. Based on the results, it was found that the proposed clustering method can rationally classify different types of home appliances by selecting the appropriate characteristics since the appliances in a cluster are very similar to each other and can help users understand the operating conditions of the appliances.Keywords: Smart Grid, Home Energy Management System, Demand Response, Hierarchical Clustering
-
The aim of this research is to design a Total Productive Maintenance (TPM) management model using a hybrid method of artificial neural networks and hierarchical clustering in power distribution companies of northwestern Iran. This study is conducted in power distribution companies of northwestern Iran, which were selected as pilot companies. Determining the optimal maintenance strategy and selecting the best management model for maintenance is of great importance. The findings of this study will be provided to Tavanir and the Ministry of Energy for further implementation in other subsidiary companies. In terms of location, the quantitative data pertains to operational data of the power distribution companies in northwestern Iran. The statistical population includes experts and personnel from the maintenance, repair, and warehousing departments of these companies. The temporal data pertains to operational data from the inventory, accounting, and process systems of power distribution companies in northwestern Iran, spanning from 2017 to 2022. The results of this research indicate that initiating the Total Productive Maintenance process requires strong managerial leadership. Subsequently, processes should be improved and undergo initial feedback evaluations. By considering the strength of human resources and enhancing employee skills, the quality of work processes will be analyzed. As these factors evolve, the system will undergo precise organization and planning. Comprehensive preventive maintenance will ensure workplace safety and health are prioritized. Another aspect that management must address is the advancement of technology and the expansion of automation systems, especially in the implementation of equipment and inventory management subsystems and resource and contract management, which are key priorities of the model. Finally, management must focus on adopting preventive maintenance, self-controlled maintenance, and re-evaluating current practices. Employees should be engaged in achieving these three goals.
Keywords: Hierarchical Clustering, Power Distribution Companies Of Northern Iran, Artificial Neural Networks, Total Productive Maintenance (TPM) Management -
در پروژه های اکتشافی، شناسایی ناهنجاری های ژیوشیمیایی در مناطق مختلف ممکن است تحت تاثیر فرآیندهای زمین شناسی، پیچیده گی های خاصی پیدا کنند. برای حل این ابهامات باید از روش های مختلف، برای درک صحیحی از اطلاعات موجود، استفاده شود. در این تحقیق با بیان مفهوم خوشه بندی سلسله مراتبی جهت شناسایی عناصر مرتبط با کانی سازی، تکینگی و نحوه ترسیم نقشه های تکینگی در قالب مدل های مولتی فراکتال و روش ماشین بردار پشتیبان، نواحی ناهنجار که احتمال کانی سازی در آن وجود دارد از مناطق زمینه تفکیک می شود. در ابتدا با روش خوشه بندی سلسله مراتبی و با استفاده از روش وارد، در خوشه های ایجاد شده، دو عنصر طلا و مس به عنوان عناصر مرتبط با کانی سازی شناسایی شدند. برای محاسبه شاخص تکینگی این دو عنصر، در هر نقطه از روش مبتنی بر پنجره و رابطه توانی عیار- مساحت استفاده شد. در نهایت با تفکیک مقادیر شاخص تکینگی به دو بخش آموزش و آزمایش و با کمک روش SVM فرآیند طبقه بندی و تخمین مقادیر شاخص تکینگی جهت شناسایی مناطق آنومال برای مناطق مجهول انجام پذیرفت. مطالعه موردی بر روی داده های مربوط به نمونه های سطحی خاک در محدوده کانسار مس پورفیری غنی از طلای دالی به مساحت 900×800 متر مربع واقع در کمربند ماگمایی ارومیه- دختر انجام شده است. نتایج حاصل از روش ترکیبی استفاده شده در این پژوهش با مطالعات قبلی مطابقت خوبی را نشان می دهد. در نتیجه استفاده از این روش های ترکیبی معرفی شده می تواند راهنمای مناسبی در جهت تولید نقشه-های ژیوشیمیایی در مناطق ناشناخته گردد.کلید واژگان: خوشه بندی سلسله مراتبی، تکینگی، ماشین بردار پشتیبان، مس پورفیری، دالی، ارومیه- دخترIn exploratory projects, the identification of geochemical anomalies in different areas may become complicated under the influence of geological processes. To solve these ambiguities, different methods should be used for a correct understanding of the available information. In this research, by expressing the concept of hierarchical clustering to identify elements related to mineralization, singularity, and how to draw singularity maps in the form of multifractal models and support vector machine method, the anomalous areas where there is a possibility of mineralization are seprated from the context regions. At first, two elements, gold and copper, were identified as elements related to mineralization in the created clusters using the hierarchical clustering method and Ward's method. To calculate the singularity index of these two elements, the method based on the window and the power relation of grade area was used at each point. Finally, by separating the singularity index values into two parts, training and testing, and with the help of the SVM method, the process of classification and estimation of singularity index values was done to identify anomalous areas for unknown areas. A case study has been carried out on the data of the porphyry copper deposit rich in Dali gold with an area of 900×800 meters located in the Urmia-Dokhtar magmatic belt. The data is related to surface soil samples in the target area. The results of this method are consistent with the previous studies conducted in the region. The results of the hybrid method used in this research show good agreement with previous studies. As a result, the use of these introduced hybrid methods can be a suitable guide for the production of geochemical maps in unknown areas.Keywords: Hierarchical clustering, singularity, Support vector machine, Porphyry copper
-
Journal of Operation and Automation in Power Engineering، سال یازدهم شماره 3 (Autumn 2023)، صص 182 -192
امروزه با توسعه زیرساخت های شبکه الکتریکی و پدید آمدن مفاهیمی چون پاسخگویی تقاضا و استفاده از خودروهای الکتریکی در اهدافی غیر از حمل و نقل، شناختن الگوهای رفتاری مشخصات فنی شبکه به منظور مدیریت بهینه سیستم های الکتریکی بسیار اهمیت یافته است.یکی از پارامترهای حیاتی در مدیریت سیستم برق، عدم تعادل شبکه توزیع است. راه های مختلفی برای بهبود و کنترل عدم تعادل شبکه وجود دارد. یکی از این راه ها تشخیص رفتار پروفایل های عدم تعادل باس در شبکه با استفاده از تجزیه و تحلیل داده ها است. در گذشته ، تجزیه و تحلیل داده های برای محیط های بزرگی مانند ایالات و کشورها انجام می شد. با این حال پس از ظهور مفهوم شبکه های هوشمند ، مطالعه رفتاری و شناخت این الگوها در محیط های کوچک و مقیاس پایین، نقش اساسی و مهمی در مدیریت عمیق این شبکه ها پیدا کرده است. یکی از روش های مناسب در تشخیص الگوهای رفتاری استفاده از داده کاوی است. در این مقاله از مفاهیم خوشه بندی سلسله مراتبی و میانگین-k برای تشخیص الگوی رفتاری شاخص عدم تعادل در یک شبکه توزیع نامتعادل استفاده میشود. سپس با تعیین خوشه هدف و با استفاده از پاسخگویی تقاضا به بهبود شاخص عدم تعادل پرداخته میشود. این روش باعث کاهش تعداد باسهای شرکت کننده در برنامه های پاسخگویی تقاضا میشود. در ادامه با استفاده از مفهوم طبقه بندی، یک درخت تصمیم در راستای کاهش زمان میترینگ ساخته میشود.
کلید واژگان: طبقه بندی، خوشه بندی، شبکه توزیع نامتعادل، داده کاوی، خودروی الکتریکیJournal of Operation and Automation in Power Engineering, Volume:11 Issue: 3, Autumn 2023, PP 182 -192With the development of electrical network infrastructure and the emergence of concepts such as demand response and using electric vehicles for purposes other than transportation, knowing the behavioral patterns of network technical specifications to manage electrical systems has become very important optimally. One of the critical parameters in the electrical system management is the distribution network imbalance. There are several ways to improve and control network imbalances. One of these ways is to detect the behavior of bus imbalance profiles in the network using data analysis. In the past, data analysis was performed for large environments such as states and countries. However, after the emergence of smart grids, behavioral study and recognition of these patterns in small-scale environments has found a fundamental and essential role in the deep management of these networks. One of the appropriate methods in identifying behavioral patterns is data mining. This paper uses the concepts of hierarchical and k-means clustering methods to identify the behavioral pattern of the imbalance index in an unbalanced distribution network. For this purpose, first, in an unbalanced network without the electric vehicle parking, the imbalance profile for all busses is estimated. Then, by applying the penetration coefficient of 25% and 75% for electric vehicles in the network, charging\discharging effects on the imbalance profile is determined. Then, by determining the target cluster and using demand response, the imbalance index is improved. This method reduces the number of busses competing in demand response programs. Next, using the concept of classification, a decision tree is constructed to minimize metering time.
Keywords: Classification, Data Mining, decision tree, demand response, hierarchical clustering, k-means, Electric Vehicle, unbalanced distribution network. -
Journal of Optimization in Industrial Engineering, Volume:16 Issue: 34, Winter and Spring 2023, PP 119 -127With global warming and energy shortages, smart grids have become a significant issue in the power grid. Demand response is one of the basic factors of smart grids. To enhance the efficiency of demand response, an intelligent home appliance control system is essential, which prioritizes the start-up of electrical appliances according to the necessity of use and efficiency. To properly manage the demand response, utilities use different signals such as price. One of the pricing methods that can be considered is different pricing for electrical appliance clusters. In this article, appliances are clustered by the K-means and hierarchical clustering based on the characteristics of the appliances themselves, such as the appliances’ extent of consumption, the type of use of home appliances, how home appliances work, the ability to change the working conditions of home appliances, home appliances usage time, etc. It seems that the K-means clustering method outperforms the hierarchical method in this issue, due to its lower value of DB coefficient. In this method, home appliances were classified into three clusters. The silhouette coefficient was developed as a measure of the K-means clustering model performance, where the average silhouette coefficient of 0.6 indicates the satisfactory value of the model. Based on the results, it was found that the proposed clustering method can rationally classify different types of home appliances by selecting the appropriate characteristics since the appliances in a cluster are very similar to each other and can help users understand the operating conditions of the appliances.Keywords: Appliance, Demand Response, k-means clustering, Hierarchical Clustering
-
Today we live in a period that is known to an area of communication. By increasing the information on the internet, the extra news are published on news agencies websites or other resources, the users are confused more with the problems of finding their desired information and related news. Among these are recommended systems they can automatically finding the news and information of their favorite’s users and suggesting to them too. This article attempts to improve the user’s interests and user’s satisfactions by refining the content based recommendation system to suggest better sources to their users. A clustering approach has been used to carry out this improvement. An attempt has been made to define a cluster threshold for clustering the same news and information in the K-means clustering algorithm. By detecting best resemblance criterion value and using an external knowledge base (ontology), we could generalize words into a set of related words (instead of using them alone). This approach is promoted the accuracy of news clustering and use the provided cluster to find user’s favorite news and also could have suggest the news to the user. Since the dataset has an important and influential role in advisory recommended systems, the standard Persian dataset is not provided and not published yet. In this research, an attempted has been made to connect and publish the dataset to finish the effect of this vacuum. The data are collected and crawl 8 periods of days from the Tabnak news agency website. The profile of each volunteers has been created and also saved at the same time as they read the favorite news on that period of time. An analysis shows that the proposed clustering approach provided by the NMI criterion has reached 70.2% on our the dataset. Also, using the suggested clustering recommendation system yield 89.2% performance based on the accuracy criterion, which shows an improvement of 8.5% in a standardized way.
Keywords: Recommender system, Persian news, Hierarchical clustering, Ontology -
Although there is evidence of a significant impact of the family on children's safety and risky behaviors, few studies have examined this issue in detail. Children under the age of 10, although they rarely participate in traffic completely independently, are a vulnerable population from a traffic safety perspective. In addition to the number of children who die in incidents, some of them suffer from lifelong disabilities. Besides various educational methods and making safe school zones, there is a need to pay attention to the impact of parents on children's understanding of traffic safety. In this study, the effect of parent on children's perception of safety and danger on roads is investigated. For this purpose, through an interview, children aged 6-9 years were asked to identify 11 unsafe traffic behaviors. Parenting styles and demographic information were collected from their parents through a questionnaire. The results showed that children's risk perception is related to age, gender, and socio-economic status. In addition, children's ability to perceive risk is associated with negative parenting styles (corporal punishment and poor monitoring). The results of this study highlight the effects of parents' education on children's awareness of road safety. The importance of parenting styles and other factors affecting children's understanding of traffic risks should be informed to families. It is also important to establish the necessary infrastructure to increase children's safety by promoting parenting skills through beneficial policies and holding training workshops for parents.Keywords: children, parenting styles, risk perception, Hierarchical clustering, decision tree
-
به دلیل اهمیت بالای کیفیت داده ها در عملکرد سامانه های نرم افزاری، فرآیند پاکسازی داده به خصوص تشخیص رکوردهای تکراری، طی سالیان اخیر یکی از مهم ترین حوزه های علوم رایانه به حساب آمده است. در این مقاله روشی برای تشخیص رکوردهای تکراری ارایه شده است که با خوشه بندی سلسله مراتبی رکوردها بر اساس ویژگی های مناسب در هر سطح، میزان شباهت میان رکوردها تخمین زده می شود. این کار سبب می شود تا خوشه هایی در سطح آخر به دست آیند که رکوردهای درون آن ها بسیار مشابه یکدیگر باشند. برای کشف رکوردهای تکراری نیز مقایسه تنها بر روی رکوردهای درون یک خوشه از سطح آخر انجام می گیرد. همچنین در این مقاله برای مقایسه میان رکوردها، یک تابع تشابه نسبی بر پایه تابع فاصله ویرایشی ارایه شده که دقت بسیار بالایی به همراه دارد. مقایسه نتایج ارزیابی سامانه نشان می دهد که روش ارایه شده، در زمان کمتری، 90% تکراری های موجود را با دقت 97% کشف می کند و بهبود داشته است.
کلید واژگان: تشخیص تکراری، پاک سازی داده، خوشه بندی سلسله مراتبی، تابع تشابه، انتخاب ویژگیAccuracy and validity of data are prerequisites of appropriate operations of any software system. Always there is possibility of occurring errors in data due to human and system faults. One of these errors is existence of duplicate records in data sources. Duplicate records refer to the same real world entity. There must be one of them in a data source, but for some reasons like aggregation of data sources and human faults in data entry, it is possible to appear several copies of an entity in a data source. This problem leads to error occurrence in operations or output results of a system; also, it costs a lot for related organization or business. Therefore, data cleaning process especially duplicate record detection, became one of the most important area of computer science in recent years. Many solutions presented for detecting duplicates in different situations, but they almost are all time-consuming. Also, the volume of data is growing up every day. hence, previous methods don’t have enough performance anymore. Incorrect detection of two different records as duplicates, is another problem that recent works are being faced. This becomes important because duplicates will usually be deleted and some correct data will be lost. So it seems that presenting new methods is necessary. In this paper, a method has been proposed that reduces required volume of process using hierarchical clustering with appropriate features. In this method, similarity between records has been estimated in several levels. In each level, a different feature has been used for estimating similarity between records. As a result, clusters that contain very similar records will be created in the last level. The comparisons are done on these records for detecting duplicates. Also, in this paper, a relative similarity function has been proposed for comparing between records. This function has high precision in determining the similarity. Eventually, the evaluation results show that the proposed method detects 90% of duplicate records with 97% accuracy in less time and results have improved.
Keywords: Duplicate Record Detection, Data Cleaning, Hierarchical Clustering, Similarity Function, Feature Selection -
In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the data points as one-dimensional ink drop patterns, in order to summarize the effects of all data points, and then applies a threshold on the resulting vectors. It is based on an ensemble clustering method which performs one-dimensional density partitioning to produce ensemble of clustering solutions. Then, it assigns a unique prime number to the data points that exist in each partition as their labels. Consequently, a combination is performed by multiplying the labels of every data point in order to produce the absolute labels. The data points with identical absolute labels are fallen into the same cluster. The hierarchical property of the algorithm is intended to cluster complex data by zooming in each already formed cluster to find further sub-clusters. The algorithm is verified using several synthetic and real-world datasets. The results show that the proposed method has a promising performance, compared to some well-known high-dimensional data clustering algorithms.
Keywords: Ensemble Clustering, High Dimensional Clustering, Hierarchical Clustering, Unsupervised Active Learning Method -
رشد انبوه اطلاعات در وب مشکلاتی را به دنبال داشته است که از مهم ترین آن ها می توان به چالش های ایجاد شده برای جستجو در وب اشاره کرد. با توجه به این که بیشتر محتویات وب امروزی برای استفاده توسط انسان طراحی شده است، ماشین ها تنها قادر به دست کاری و فهم داده ها در سطح لغت هستند؛ این مساله مهم ترین مانع در سرویس دهی بهتر به کاربران وب است. هدف این مقاله ارایه نتایج بهتر در پاسخ به جستجوی کاربران وب معنایی است. به این منظور در روش پیشنهادی ابتدا عبارت مورد نظر کاربر با توجه به میزان موضوعات مرتبط با آن، مورد بررسی قرار می گیرد. پاسخ به دست آمده از این بررسی، وارد یک سامانه رتبه دهی متشکل از سامانه تصمیم گیری فازی و خوشه بندی سلسله مراتبی می شود تا نتایج مطلوب تری را به کاربر بازگرداند. گفتنی است که روش پیشنهادی نیاز به هیچ گونه دانش قبلی برای خوشه بندی داده ها ندارد؛ علاوه بر این دقت و جامعیت این پاسخ نیز اندازه گیری می شود؛ درنهایت، بر روی نتایج به دست آمده آزمون F اعمال می شود که اغلب به عنوان یک معیار از عملکرد سامانه، برای ارزیابی الگوریتم و سامانه های مورد استفاده در نظر گرفته می شود. نتایج حاصل از این آزمون نشان می دهد که روش ارایه شده در این مقاله می تواند پاسخ دقیق تر و جامع تری نسبت به روش های مشابه خود ارایه دهد و به طور میانگین دقت را تا 22/1 درصد افزایش دهد.
کلید واژگان: وب معنایی، منطق فازی، خوشه بندی سلسله مراتبی، روابط معنایی پنهان، الگوریتم HFCSThis paper discusses about the future of the World Wide Web development, called Semantic Web. Undoubtedly, Web service is one of the most important services on the Internet, which has had the greatest impact on the generalization of the Internet in human societies. Internet penetration has been an effective factor in growth of the volume of information on the Web. The massive growth of information on the Web has led to some problems, the most important one is search query. Nowadays, search engines use different techniques to deliver high quality results, but we still see that search results are not ideal. It should also be noted that information retrieval techniques to a certain extent can increase the search accuracy. Most of the web content is designed for human usage and machines are only able to understand and manipulate data at word level. This is the major limitation for providing better services to web users. The solution provided for this topic is to display the content of the web in such a way that it can be readily understood and comprehensible to the machine. This solution, which will lead to a huge transformation on the Web is called the Semantic Web and will begin. Better results for responding to the search for semantic web users, is the purpose of this research. In the proposed method, the expression, searched by the user, will be examined according to the related topics. The response obtained from this section enters to a rating system, which is consisted of a fuzzy decision-making system and a hierarchical clustering system, to return better results to the user. It should be noted that the proposed method does not require any prior knowledge for clustering the data. In addition, accuracy and comprehensiveness of the response are measured. Finally, the F test is applied to obtain a criterion for evaluating the performance of the algorithm and systems. The results of the test show that the method presented in this paper can provide a more precise and comprehensive response than its similar methods and it increases the accuracy up to 1.22%, on average.
Keywords: Semantic Web, Fuzzy Logic, Hierarchical Clustering, Latent Semantic, HFCS -
چشمه های گوگردی با ویژگی های درمانی، دمایی و خصوصیات هیدروشیمیایی خاص خود از دیگر منابع آب زیرزمینی متمایز می شوند. چشمه های گرو گلگیر، گرو میداود، گرو نفت سفید، گرو پل زال، گرساب جااردو، مشراگه، گراب بهبهان و بابا احمد ازجمله چشمه گوگردی دمای پایین در محدوده زاگرس خوزستان بشمار می روند. این چشمه ها دمایی بین 22 تا 35 درجه سانتی گراد داشته، و در امتداد راندگی زاگرس ظاهر می شوند. با آنالیز داده های هیدروشیمیایی چشمه های گوگردی در دو دوره زمانی آذر 96 و اردیبهشت 97 و به کارگیری روش های هیدروشیمایی، تحلیل عاملی (PCA) و خوشه بندی سلسله مراتبی (AHC) سعی شده است عوامل موثر بر کیفیت شیمیایی چشمه ها شناسایی گردد. عامل اول موثر متغیرهای هدایت الکتریکی، Na، Cl، K، Ca، Mg، As و Cd را در برمی گیرد، این عامل ناشی از برهمکنش مواد آبخوان و آب می باشد، همچنین همبستگی بالا بین یون های اصلی و آرسنیک و کادمیوم نشان میدهد این عناصر منشا واحدی دارند. عامل دوم متغیرهای SO4 ، NO3 وS را در برمی گیرد. با توجه به نتایج آنالیز TOC این عامل را می توان ناشی از نفوذ شورابه های نفتی دانست. برای دسته بندی چشمه های مورد مطالعه از دو روش دیاگرام پایپر و خوشه بندی سلسله مراتبی Q-mode استفاده شد. دیاگرام پایپر این چشمه ها را در سه گروه و روش خوشه بندی سلسله مراتبی Q-mode با در نظر گرفتن پارامترهای بیشتر، چشمه های گوگردی در چهار گروه تفکیک کرد.
کلید واژگان: چشمه های گوگردی، تحلیل عاملی، خوشه بندی سلسله مراتبی، سازند گچساران، شورابه نفتیSulfur springs are distinguished from other groundwater sources by their specific therapeutic, thermal and hydrochemical properties. Golgir, Meydavood, Naft Sefid, pole Zal, Grasab JaOrdo, Mashrageh, Grab Behbahan, and Baba Ahmad Springs are low-temperature sulfur springs in Zagros area of Khuzestan. These springs have temperatures between 22 and 35 °C and appear along with the Zagros thrust. By analyzing the hydrochemical data of sulfur springs on two periods (December 2017 and May 2018) and applying the hydrochemical methods, principal component analysis (PCA) and hierarchical clustering (AHC), it has been tried to identify the factors affecting the chemical quality of the springs. The first factor includes the electrical conductivity, Na, Cl, K, Ca, Mg, As and Cd variables, this factor is due to the interaction of aquifer and water materials, as well as high correlation between the main ions, arsenic and cadmium indicates that these elements have similar origin.The second factor includes the SO4, NO3, and S variables. According to the results of TOC analysis, this factor can be attributed to the influence of oil brines. Piper diagrams and hierarchical Q-mode clustering were used to classify the springs studied. The Piper diagram separated these springs into three groups and the hierarchical Q-mode clustering method, taking into account further parameters of sulfur springs in four groups.
Keywords: Sulfur Spring, Principal component analysis, Hierarchical clustering, Gachsaran formation, Oil Brines -
تشخیص محدوده تومورهای مغزی یک گام مهم و اساسی در سیستم های تشخیص و درمان خودکار می باشد. در این مقاله یک روش ترکیبی مبتنی بر سیستم استنتاج فازی-عصبی وفقی (ANFIS) و خوشه بندی سلسله مراتبی برای تشخیص موقعیت و محدوده تومورهای مغزی ارائه شده است. برای این منظور ابتدا خط مرکزی ناحیه مغز تشخیص داده شده، سپس با بلاک بندی ناحیه دو نیمکره مغز و استخراج ویژگی شدت روشنایی و بافت هر بلاک و نیز با بهره گیری از ویژگی تقارن موجود در دو نیمکره مغز، بلاک های حاوی بافت های توموری با استفاده از دسته بندی کننده ANFIS تشخیص داده می شوند. در نهایت با هموارسازی تصویر تصاویر تشدید مغناطیسی (MRI) مغز و با بهره گیری از خوشه بندی سلسله مراتبی محدوده دقیق تومور مشخص خواهد شد. روش ارائه شده روی تصاویر MRIبانک Harvard ارزیابی شده است. کارایی روش ارائه شده با استفاده از معیار دقت برابر %7/4±1/98، حساسیت برابر%2/3±1/94 و خاصگی آن برابر%9/4±7/98 می باشد.کلید واژگان: تشخیص تومور، استنتاج فازی-عصبی، خوشه بندی سلسله مراتبی، قطعه بندی، تصاویر تشدید مغناطیسیDetection of brain tumors region is a crucial step in automatic detection and treatment systems. This paper presents a hybrid method based on adaptive neuro-fuzzy inference system (ANFIS) and hierarchical clustering to identify location and region of brain tumors. For this purpose, first the center line of brain is detected, and then brain region is divided into non-overlapped blocks. Then, for each block intensity and texture features are extracted. With exploitation symmetry features of two hemispheres of the brain, blocks containing tumor tissue are recognized using ANFIS classifier. Finally by smoothing brain MRI image and exploiting hierarchical clustering, exact region of tumor is specified. The proposed method was tested on Harvard MRI dataset. The obtained performance of the proposed method with criterions accuracy, sensitivity and specificity are 98.1±4.7%, 94.1±3.2% and 98.7±4.9% respectively.Keywords: Tumor detection, Neuro-fuzzy inference, Hierarchical clustering, Segmentation, Magnetic resonance images
-
خوشه بندی یکی از شاخه های مهم موجود در داده کاوی است که هدف آن تقسیم داده ها به زیرمجموعه های معناداری است که خوشه نامیده می شوند. این تکنیک شامل فرآیند پیدا کردن گروه بندی طبیعی در مجموعه داده ها، بر اساس شباهت و تفاوت است به نحوی که اطلاعات قبلی کمی در مورد داده ها در دسترس است و یا اصلا اطلاعاتی در دسترس نیست. در طی دهه های متمادی الگوریتم های فراوانی برای خوشه بندی در رویکردهای مختلف و متفاوت و یا ترکیبی از آنها ایجاد شده اند. در این مقاله الگوریتمی بر پایه رویکردهای مبنی بر تراکم و سلسله مراتبی ارائه می شود. DBSCANیکی از الگوریتم های مطرح شده در رویکرد مبتنی بر تراکم است. این الگوریتم نیاز به دو پارامتر دارد که تعیین آن هنوز یک چالش بزرگ است. در روش پیشنهادی پارامترهای الگوریتم DBSCANطوری تنظیم می شود که بدون نیاز به دخالت کاربر، خوشه های احتمالی بصورت خودکار یافت شوند. سپس خوشه های نزدیک به یکدیگر به قدری باهم ادغام می شوند تا کیفیت خوشه های نهایی به نحو مطلوبی ارتقا یابد. بدین ترتیب خوشه های باکیفیت و دقیقی بدست خواهد آمد. در انتها برای آزمایش این الگوریتم ترکیبی جدید از داده های واقعی موجود در پایگاه داده UCIاستفاده شد. نتایج نشان می دهد که الگوریتم ترکیبی جدید کارایی بیشتر و دقیقتر و سرعت مناسبی نسبت به روش های قبلی دارد.
کلید واژگان: داده کاوی، خوشه بندی ترکیبی، خوشه بندی سلسله مراتبی، خوشه بندی مبتنی بر تراکمElectronics Industries, Volume:9 Issue: 1, 2018, PP 133 -143Clustering is one of the most important field of data mining that aims to divide data into meaningful subsets which are called clusters. This technique involves the process of finding natural groupings in the data set based on the similarities and di similarities which a little or no information about data are available. Over the decades, many clustering algorithms are created in different approaches or a combination of them. In this paper, an algorithm based on density and hierarchical approaches is presented. DBSCAN is one of the algorithms presented in the density-based approach. This algorithm requires two parameters that its determination is a great challenge. In the proposed method, DBSCAN algorithm parameters can be set without user involvement, so that potential clusters are found automatically. The clusters which are so close to each other are merged together until the quality of the final clusters to be enhanced properly. Thus, clusters could be more accurate and high quality. Finally, in order to test the new proposed algorithm, the real dataset in the UCI machine learning repository was used. The results indicate that the new algorithm is more efficient and accurate, and its speed is better than previous methods.
Keywords: data mining, combinational clustering, hierarchical clustering, density-based clustering -
سیستم ایمنی مصنوعی (AIS) یکی از مهمترین الگوریتم های متاهیوریستیک به منظور حل مسائل بسیار پیچیده می باشد. از این الگوریتم می توان در تحلیل خوشه بندی داده ها استفاده نمود. علی رغم اینکه AIS قادر است پیکربندی فضای جستجو را به خوبی نمایش دهد اما تعیین خوشه های داده ها به طور مستقیم با استفاده از خروجی آن بسیار مشکل است. بر این اساس در این مقاله الگوریتم دو مرحله ای پیشنهاد شده است. در مرحله اول با استفاده از الگوریتم AIS پیشنهادی، فضای جستجو مورد بررسی قرار گرفته و پیکربندی فضا تعیین می شود و در مرحله دوم با استفاده از روش خوشه بندی سلسله مراتبی، خوشه ها و تعداد آنها مشخص می شود. در انتها الگوریتم پیشنهادی بر روی نمونه واقعی متشکل از داده های زلزله در ایران پیاده سازی و با نتایج الگوریتم مشابه مقایسه شده است. نتایج نشان داد که الگوریتم پیشنهادی توانسته است نقایص موجود در AIS و روش خوشه بندی سلسله مراتبی را پوشش دهد و از طرفی از دقت و سرعت قابل قبولی برخوردار است.کلید واژگان: تحلیل خوشه بندی، سیستم ایمنی مصنوعی (AIS)، خوشه بندی سلسله مراتبیArtificial immune system (AIS) is one of the most meta-heuristic algorithms to solve complex problems. With a large number of data, creating a rapid decision and stable results are the most challenging tasks due to the rapid variation in real world. Clustering technique is a possible solution for overcoming these problems. The goal of clustering analysis is to group similar objects.
AIS algorithm can be used in data clustering analysis. Although AIS is able to good display configure of the search space, but determination of clusters of data set directly using the AIS output will be very difficult and costly. Accordingly, in this paper a two-step algorithm is proposed based on AIS algorithm and hierarchical clustering technique. High execution speed and no need to specify the number of clusters are the benefits of the hierarchical clustering technique. But this technique is sensitive to outlier data.
So, in the first stage of introduced algorithm using the proposed AIS algorithm, search space was investigated and the configuration space and therefore outlier data are determined. Then in second phase, using hierarchical clustering technique, clusters and their number are determined. Consequently, the first stage of proposed algorithm eliminates the disadvantages of the hierarchical clustering technique, and AIS problems will be resolved in the second stage of the proposed algorithm.
In this paper, the proposed algorithm is evaluated and assessed through two metrics that were identified as (i) execution time (ii) Sum of Squared Error (SSE): the average total distance between the center of a cluster with cluster members used to measure the goodness of a clustering structure. Finally, the proposed algorithm has been implemented on a real sample data composed of the earthquake in Iran and has been compared with the similar algorithm titled Improved Ant System-based Clustering algorithm (IASC). IASC is based on Ant Colony System (ACS) as the meta-heuristics clustering algorithm. It is a fast algorithm and is suitable for dynamic environments. Table 1 shows the results of evaluation.
The results showed that the proposed algorithm is able to cover the drawbacks in AIS and hierarchical clustering techniques and the other hand has high precision and acceptable run speed.Keywords: Clustering Analysis, Artificial immune system (AIS), Hierarchical Clustering -
Motorcycle crashes constitute a significant proportion of traffic accidents all over the world. The aim of this paper was to examine the motorcycle crash patterns and rider fault status across the provinces of Iran. For this purpose, 6638 motorcycle crashes occurred in Iran through 2009-2012 were used as the analysis data and a two-step clustering approach was adopted as the analysis framework. Firstly, hierarchical clustering (HC) was applied to group the provinces into homogenous clusters, based on the distribution of crash characteristics in each province. In the second step, the latent class clustering (LCC) was employed to investigate the crash patterns and rider fault status among the provinces. The provincial groupings were found to be an influential factor in the final crash clusters implying the effectiveness of the proposed framework. Results of LCC also indicated that Cluster 8 with the highest percentages of not wearing helmet, unlicensed and under 21 years old riders, had the highest percentage of fatal crashes. In addition, the motorcyclists seemed to be less responsible in the pedestrian-motorcycle crashes. Accordingly, training programs for the riders in the license issuance process about the risk of pedestrian-motorcycle crashes could help mitigate this type of crashes. Generally, analyzing the culpability in pedestrian-motorcycle crashes might be a good topic for future research. Further discussions on the crash patterns are provided. Finally, the combined use of HC and LCC should not be regarded as an alternative to the other more qualitative predictive methods, but as a preliminary analysis tool to provide insights over the road safety condition at the national level.Keywords: Hierarchical clustering, Latent class clustering, Motorcycle crashes, Motorcyclist's fault status
-
خوشه بندی داده ها یک ابزار پایه موجود برای درک ساختار مجموعه داده ها است. فرایندی که داده ها را در گروه های از اشیاء شبیه به هم قرار می دهد خوشه بندی نام دارد. خوشه بندی یکی از مهم ترین مسائل بدون ناظر برای یافتن ساختار در یک مجموعه داده های برچسب نخورده است. الگوریتم های خوشه بندی با توجه به نوع داده ها به دو دسته تقسیم می شوند: الگوریتم های خوشه بندی داده های عددی و الگوریتم های خوشه بندی داده های دسته ای. الگوریتم های خوشه بندی داده های دسته ای به دلیل ماهیت و کاربرد این داده ها نسبت به الگوریتم های خوشه بندی داده های عددی از اهمیت بالایی برخوداراند. در این مقاله ابتدا به بررسی ماهیت این نوع داده ها پرداخته و سپس معیارهای شباهت و الگوریتم های خوشه بندی مطرح شده در این حوزه را بررسی می کنیم و در انتها، روشی ترکیبی، برپایه ترکیب دو الگوریتم خوشه بندی سلسله مراتبی و خوشه بندی تفکیکی برای خوشه بندی بهتر این نوع داده ها ارائه می دهیم. آزمایشات نشان می دهد که روش ارائه شده در این مقاله نتایج حاصل از خوشه بندی را بهبود می بخشد.کلید واژگان: داده کاوی، خوشه بندی، داده های دسته ای، الگوریتم خوشه بندی k-modes، الگوریتم خوشه بندی سلسله مراتبیData clustering is a basic tool for understanding the structure of data collections. The process puts the data into groups of similar objects is called clustering. Clustering is one of the main issues of unsupervised clustering to find the structure in a set of unlabeled data. Clustering algorithms can be divided into two categories according to the type of data: Clustering algorithms for numerical data and clustering algorithms for categorical data. The clustering algorithms for categorical data are more important than clustering algorithms for numerical data because of the nature and application of these data. In this paper, at first the nature of this type of data is described and then the clustering algorithms and similarity measures presented in this area are reviewed. Finally, a hybrid method is proposed based on the combination of the hierarchical clustering algorithm and the partitioning clustering algorithm. The experiments show that the proposed method in this paper improves the results of clustering.Keywords: Data mining, Clustering, Categorical data, K, modes clustering, Hierarchical clustering
-
Vehicle detection is one of the important tasks in automatic driving. It is a hard problem that many researchers focused on it. Most commercial vehicle detection systems are based on radar. But these methods have some problems such as have problem in zigzag motions. Image processing techniques can overcome these problems.This paper introduces a method based on hierarchical clustering using low-level image features for on-road vehicle detection. Each vehicle assumed as a cluster. In traditional clustering methods, the threshold distance for each cluster is fixed, but in this paper, the adaptive threshold varies according to the position of each cluster. The threshold measure is computed with bivariate normal distribution. Sampling and teammate selection for each cluster is applied by the members-based weighted average. For this purpose, unlike other methods that use only horizontal or vertical lines, a fully edge detection algorithm was utilized. Corner is an important feature of video images that commonly were used in vehicle detection systems. In this paper, Harris features are applied to detect the corners. LISA data set is used to evaluate the proposed method. Several experiments are applied to investigate the performance of proposed algorithm. Experimental results show good performance compared to other algorithms.Keywords: adaptive feature grouping, moving camera image processing, vehicle detection, hierarchical clustering, low, level features
-
Traditional leveraging statistical methods for analyzing today’s large volumes of spatial data have high computational burdens. To eliminate the deficiency, relatively modern data mining techniques have been recently applied in different spatial analysis tasks with the purpose of autonomous knowledge extraction from high-volume spatial data. Fortunately, geospatial data is considered a proper subject for leveraging data mining techniques. The main purpose of this paper is presenting a hybrid geospatial data clustering mechanism in order to achieve a high performance hotspot analysis method. The method basically works on 2 or 3-dimensional geographic coordinates of different natural and unnatural phenomena. It uses the systematic cooperation of two popular clustering algorithms: the AGlomerative NEStive, as a hierarchical clustering method and κ-means, as a partitional clustering method. It is claimed that the hybrid method will inherit the low time complexity of the κ-means algorithm and also relative independency from user’s knowledge of the AGNES algorithm. Thus, the proposed method is expected to be faster than AGNES algorithm and also more accurate than κ-means algorithm. Finally, the method was evaluated against two popular clustering measurement criteria. The first clustering evaluation criterion is adapted from Fisher’s separability criterion, and the second one is the popular minimum total distance measure. Results of evaluation reveal that the proposed hybrid method results in an acceptable performance. It has a desirable time complexity and also enjoys a higher cluster quality than its parents (AGNES and κ-means). Real-time processing of hotspots requires an efficient approach with low time complexity. So, the problem of time complexity has been taken into account in designing the proposed approach.
Keywords: Geospatial data, Geographical knowledge discovery, Hotspot analysis, Hierarchical clustering, Partitional clustering, Hybrid clusteringapproach, Earthquake hotspots, Crime mapping
- نتایج بر اساس تاریخ انتشار مرتب شدهاند.
- کلیدواژه مورد نظر شما تنها در فیلد کلیدواژگان مقالات جستجو شدهاست. به منظور حذف نتایج غیر مرتبط، جستجو تنها در مقالات مجلاتی انجام شده که با مجله ماخذ هم موضوع هستند.
- در صورتی که میخواهید جستجو را در همه موضوعات و با شرایط دیگر تکرار کنید به صفحه جستجوی پیشرفته مجلات مراجعه کنید.