فهرست مطالب

Artificial Intelligence and Data Mining - Volume:5 Issue: 2, Summer-Autumn 2017

Journal of Artificial Intelligence and Data Mining
Volume:5 Issue: 2, Summer-Autumn 2017

  • تاریخ انتشار: 1396/03/24
  • تعداد عناوین: 15
|
  • F. Karimian, S. M. Babamir * Pages 149-167
    Reliability of software counts on its fault-prone modules. This means that the less software consists of fault-prone units the more we may trust it. Therefore, if we are able to predict the number of fault-prone modules of software, it will be possible to judge the software reliability. In predicting software fault-prone modules, one of the contributing features is software metric by which one can classify software modules into fault-prone and non-fault-prone ones. To make such a classification, we investigated into 17 classifier methods whose features (attributes) are software metrics (39 metrics) and instances (software modules) of mining are instances of 13 datasets reported by NASA.
    However, there are two important issues influencing our prediction accuracy when we use data mining
    Methods
    (1) selecting the best/most influent features (i.e. software metrics) when there is a wide diversity of them and (2) instance sampling in order to balance the imbalanced instances of mining; we have two imbalanced classes when the classifier biases towards the majority class. Based on the feature selection and instance sampling, we considered 4 scenarios in appraisal of 17 classifier methods to predict software fault-prone modules. To select features, we used Correlation-based Feature Selection (CFS) and to sample instances we did Synthetic Minority Oversampling Technique (SMOTE). Empirical results showed that suitable sampling software modules significantly influences on accuracy of predicting software reliability but metric selection has not considerable effect on the prediction.
    Keywords: Software fault prediction, Classifier performance, Feature selection, Data sampling, Software metric
  • M. B. Dowlatshahi, V. Derhami * Pages 169-181
    A combinatorial auction is an auction where the bidders have the choice to bid on bundles of items. The WDP in combinatorial auctions is the problem of finding winning bids that maximize the auctioneer’s revenue under the constraint that each item can be allocated to at most one bidder. The WDP is known as an NP-hard problem with practical applications like electronic commerce, production management, games theory, and resources allocation in multi-agent systems. This has motivated the quest for efficient approximate algorithms both in terms of solution quality and computational time. This paper proposes a hybrid Ant Colony Optimization with a novel Multi-Neighborhood Local Search (ACO-MNLS) algorithm for solving Winner Determination Problem (WDP) in combinatorial auctions. Our proposed MNLS algorithm uses the fact that using various neighborhoods in local search can generate different local optima for WDP and that the global optima of WDP is a local optima for a given its neighborhood. Therefore, proposed MNLS algorithm simultaneously explores a set of three different neighborhoods to get different local optima and to escape from local optima. The comparisons between ACO-MNLS, Genetic Algorithm (GA), Memetic Algorithm (MA), Stochastic Local Search (SLS), and Tabu Search (TS) on various benchmark problems confirm the efficiency of ACO-MNLS in the terms of solution quality and computational time.
    Keywords: Winner Determination Problem, Combinatorial Auctions, Ant Colony Optimization, Multi-Neighborhood Search, Combinatorial Optimization
  • Sh. Lotfi, F. Karimi * Pages 183-195
    In many real-world applications, various optimization problems with conflicting objectives are very common. In this paper we employ Multi-Objective Evolutionary Algorithm based on Decomposition (MOEA/D), a newly developed method, beside Tabu Search (TS) accompaniment to achieve a new manner for solving multi-objective optimization problems (MOPs) with two or three conflicting objectives. This improved hybrid algorithm, namely MOEA/D-TS, uses the parallel computing capacity of MOEA/D along with the neighborhood search authority of TS for discovering Pareto optimal solutions. Our goal is exploiting the advantages of evolutionary algorithms and TS to achieve an integrated method to cover the totality of the Pareto front by uniformly distributed solutions. In order to evaluate the capabilities of the proposed method, its performance, based on the various metrics, is compared with SPEA, COMOEATS and SPEA2TS on well-known Zitzler-Deb-Thiele’s ZDT test suite and DTLZ test functions with separable objective functions. According to the experimental results, the proposed method could significantly outperform previous algorithms and produce fully satisfactory results.
    Keywords: Multi-objective problems, Evolutionary Algorithms, Hybrid method, MOEA-D, Tabu Search
  • A. Mousavi *, A. Sheikh Mohammad Zadeh, M. Akbari, A. Hunter Pages 197-210
    Mobile technologies have deployed a variety of Internet–based services via location based services. The adoption of these services by users has led to mammoth amounts of trajectory data. To use these services effectively, analysis of these kinds of data across different application domains is required in order to identify the activities that users might need to do in different places. Researchers from different communities have developed models and techniques to extract activity types from such data, but they mainly have focused on the geometric properties of trajectories and do not consider the semantic aspect of moving objects. This work proposes a new ontology-based approach so as to recognize human activity from GPS data for understanding and interpreting mobility data. The performance of the approach was tested and evaluated using a dataset, which was acquired by a user over a year within the urban area in the City of Calgary in 2010. It was observed that the accuracy of the results was related to the availability of the points of interest around the places that the user had stopped. Moreover, an evaluation experiment was done, which revealed the effectiveness of the proposed method with an improvement of 50 % performance with complexity trend of an O(n).
    Keywords: Ontology, Data mining, Activity Recognition, Semantic, GPS
  • A. Goshvarpour, A. Abbasi *, A. Goshvarpour Pages 211-221
    Emotion, as a psychophysiological state, plays an important role in human communications and daily life. Emotion studies related to the physiological signals are recently the subject of many researches. In This study a hybrid feature based approach was proposed to examine affective states. To this effect, Electrocardiogram (ECG) signals of 47 students were recorded using pictorial emotion elicitation paradigm. Affective pictures were selected from the International Affective Picture System and assigned into four different emotion classes. After extracting approximate and detail coefficients of Wavelet Transform (WT / Daubechies 4 at level 8), two measures of the second-order difference plot (CTM and D) were calculated for each wavelet coefficient. Subsequently, Least Squares Support Vector Machine (LS-SVM) was applied to discriminate between affective states and the rest. The statistical analysis indicated that the density of CTM in the rest is distinctive from the emotional categories. In addition, the second-order difference plot measurements at the last level of WT coefficients showed significant differences between the rest and emotion categories. Applying LS-SVM, the maximum classification rate of 80.24 % was reached for discrimination between rest and fear. The results of this study indicate the usefulness of the WT in combination with nonlinear technique in characterizing emotional states.
    Keywords: Combining Features, Electrocardiogram, Emotion, Second-Order Difference Plot, Wavelet Transform
  • M. Nikpour, R. Karami *, R. Ghaderi Pages 223-234
    Sparse coding is an unsupervised method which learns a set of over-complete bases to represent data such as image and video. Sparse coding has increasing attraction for image classification applications in recent years. But in the cases where we have some similar images from different classes, such as face recognition applications, different images may be classified into the same class, and hence the classification performance may be decreased. In this paper, we propose an Affine Graph Regularized Sparse Coding approach for face recognition problem. Experiments on several well-known face datasets show that the proposed method can significantly improve the face classification accuracy. In addition, some experiments have been done to illustrate the robustness of the proposed method to noise. The results show the superiority of the proposed method in comparison to some other methods in face classification.
    Keywords: Sparse coding, Manifold Learning, Face recognition, Graph Regularization
  • F. Fadaei Noghani, M. Moattar * Pages 235-243
    Due to the rise of technology, the possibility of fraud in different areas such as banking has been increased. Credit card fraud is a crucial problem in banking and its danger is over increasing. This paper proposes an advanced data mining method, considering both feature selection and decision cost for accuracy enhancement of credit card fraud detection. After selecting the best and most effective features, using an extended wrapper method, ensemble classification is performed. The extended feature selection approach includes a prior feature filtering and a wrapper approach using C4.5 decision tree. Ensemble classification, using cost sensitive decision trees is performed in a decision forest framework. A locally gathered fraud detection dataset is used to estimate the proposed method. The proposed method is assessed using accuracy, recall, and F-measure as evaluation metrics and compared with basic classification algorithms including ID3, J48, Naïve Bayes, Bayesian Network and NB tree. Experiments show that considering the F-measure as evaluation metric, the proposed approach yields 1.8 to 2.4 percent performance improvement compared to other classifiers.
    Keywords: credit card fraud detection, Feature selection, ensemble classification, cost sensitive learning
  • V. Ghasemi *, A. Pouyan, M. Sharifi Pages 245-258
    This paper proposes a scheme for activity recognition in sensor based smart homes using Dempster-Shafer theory of evidence. In this work, opinion owners and their belief masses are constructed from sensors and employed in a single-layered inference architecture. The belief masses are calculated using beta probability distribution function. The frames of opinion owners are derived automatically for activities, to achieve more flexibility and extensibility. Our method is verified via two experiments. In the first experiment, it is compared to a naïve Bayes approach and three ontology based methods. In this experiment our method outperforms the naïve Bayes classifier, having 88.9% accuracy. However, it is comparable and similar to the ontology based schemes, but since no manual ontology definition is needed, our method is more flexible and extensible than the previous ones. In the second experiment, a larger dataset is used and our method is compared to three approaches which are based on naïve Bayes classifiers, hidden Markov models, and hidden semi Markov models. Three features are extracted from sensors’ data and incorporated in the benchmark methods, making nine implementations. In this experiment our method shows an accuracy of 94.2% that in most of the cases outperforms the benchmark methods, or is comparable to them.
    Keywords: Activity Recognition, Dempster-Shafer theory of evidence, smart homes
  • T. Zare *, M. T. Sadeghi, H. R. Abutalebi, J. Kittler Pages 259-273
    Machine-learning solutions to classification, clustering and matching problems critically depend on the adopted metric, which in the past was selected heuristically. In the last decade, it has been demonstrated that an appropriate metric can be learnt from data, resulting in superior performance as compared with traditional metrics. This has recently stimulated a considerable interest in the topic of metric learning, especially using kernel functions, which map data to feature spaces with enhanced class separability, and implicitly define a new metric in the original feature space. The formulation of the problem of metric learning depends on the supervisory information available for the task. In this paper, we focus on semi-supervised kernel based distance metric learning where the training data set is unlabelled, with the exception of a small subset of pairs of points labelled as belonging to the same class (cluster) or different classes (clusters). The proposed method involves creating a pool of kernel functions. The corresponding kernels matrices are first clustered to remove redundancy in representation. A composite kernel constructed from the kernel clustering result is then expanded into an orthogonal set of basis functions. The mixing parameters of this expansion are then optimised using point similarity and dissimilarity information conveyed by the labels. The proposed method is evaluated on synthetic and real data sets. The results show the merit of using similarity and dissimilarity information jointly as compared to using just the similarity information, and the superiority of the proposed method over all the recently introduced metric learning approaches.
    Keywords: Distance Metric Learning, Semi-supervised Clustering, Composite Kernels, Pairwise Similarity, Dissimilarity Constraints, Optimisation Problem
  • F. Safi-Esfahani *, Sh. Rakian, M.H. Nadimi-Shahraki Pages 275-284
    Plagiarism which is defined as “the wrongful appropriation of other writers’ or authors’ works and ideas without citing or informing them” poses a major challenge to knowledge spread publication. Plagiarism has been placed in four categories of direct, paraphrasing (rewriting), translation, and combinatory. This paper addresses translational plagiarism which is sometimes referred to as cross-lingual plagiarism. In cross-lingual translation, writers meld a translation with their own words and ideas. Based on monolingual plagiarism detection methods, this paper ultimately intends to find a way to detect cross-lingual plagiarism. A framework called Multi-Lingual Plagiarism Detection (MLPD) has been presented for cross-lingual plagiarism analysis with ultimate objective of detection of plagiarism cases. English is the reference language and Persian materials are back translated using translation tools. The data for assessment of MLPD were obtained from English-Persian Mizan parallel corpus. Apache’s Solr was also applied to record the creep of the documents and their indexation. The accuracy mean of the proposed method revealed to be 98.82% when employing highly accurate translation tools which indicate the high accuracy of the proposed method. Also, Google translation service showed the accuracy mean to be 56.9%. These tests demonstrate that improved translation tools enhance the accuracy of the proposed method.
    Keywords: Text Retrieval, Cross-lingual, Text Similarity, Translation, Plagiarism
  • M. Farhid*, M. Shamsi, M. H. Sedaaghi Pages 285-291
    Adaptive networks include a set of nodes with adaptation and learning abilities for modeling various types of self-organized and complex activities encountered in the real world. This paper presents the effect of heterogeneously distributed incremental LMS algorithm with ideal links on the quality of unknown parameter estimation. In heterogeneous adaptive networks, a fraction of the nodes, defined based on previously calculated signal to noise ratio (SNR), is assumed to be the informed nodes that collect data and perform in-network processing, while the remaining nodes are assumed to be uninformed and only participate in the processing tasks. As our simulation results show, the proposed algorithm not only considerably improves the performance of the Distributed Incremental LMS algorithm in a same condition, but also proves a good accuracy of estimation in cases where some of the nodes make unreliable observations (noisy nodes). Also studied is the application of the same algorithm on the cases where node failure happens
    Keywords: Adaptive networks, distributed estimation, Least mean-square (LMS), informed nodes, mean square deviation (MSD)
  • M. Lashkari, M. Moattar * Pages 293-305
    A well-known clustering algorithm is K-means. This algorithm, besides advantages such as high speed and ease of employment, suffers from the problem of local optima. In order to overcome this problem, a lot of studies have been done in clustering. This paper presents a hybrid Extended Cuckoo Optimization Algorithm (ECOA) and K-means (K), which is called ECOA-K. The COA algorithm has advantages such as fast convergence rate, intelligent operators and simultaneous local and global search which are the motivations behind choosing this algorithm. In the Extended Cuckoo Algorithm, we have enhanced the operators in the classical version of the Cuckoo algorithm. The proposed operator of production of the initial population is based on a Chaos trail whereas in the classical version, it is based on randomized trail. Moreover, allocating the number of eggs to each cuckoo in the revised algorithm is done based on its fitness. Another improvement is in cuckoos’ migration which is performed with different deviation degrees. The proposed method is evaluated on several standard data sets at UCI database and its performance is compared with those of Black Hole (BH), Big Bang Big Crunch (BBBC), Cuckoo Search Algorithm (CSA), traditional Cuckoo Optimization Algorithm (COA) and K-means algorithm. The results obtained are compared in terms of purity degree, coefficient of variance, convergence rate and time complexity. The simulation results show that the proposed algorithm is capable of yielding the optimized solution with higher purity degree, faster convergence rate and stability in comparison to the other compared algorithms.
    Keywords: Clustering, K-means algorithm, Cuckoo Optimization Algorithm (COA), Chaotic Function, Migration
  • P. Shahsamandi Esfahani *, A. Saghaei Pages 307-317
    Data clustering is one of the most important areas of research in data mining and knowledge discovery. Recent research in this area has shown that the best clustering results can be achieved using multi-objective methods. In other words, assuming more than one criterion as objective functions for clustering data can measurably increase the quality of clustering. In this study, a model with two contradictory objective functions based on maximum data compactness in clusters (the degree of proximity of data) and maximum cluster separation (the degree of remoteness of clusters’ centers) is proposed. In order to solve this model, a recently proposed optimization method, the Multi-objective Improved Teaching Learning Based Optimization (MOITLBO) algorithm, is used. This algorithm is tested on several datasets and its clusters are compared with the results of some single-objective algorithms. Furthermore, with respect to noise, the comparison of the performance of the proposed model with another multi-objective model shows that it is robust to noisy data sets and thus can be efficiently used for multi-objective fuzzy clustering.
    Keywords: Fuzzy clustering, Cluster validity measure, Multi-Objective Optimization, meta-heuristic algorithms, Improved Teaching-Learning Based Optimization
  • E. Fadaei-Kermani *, G. A. Barani, M. Ghaeini-Hessaroeyeh Pages 319-325
    Drought is a climate phenomenon which might occur in any climate condition and all regions on the earth. Effective drought management depends on the application of appropriate drought indices. Drought indices are variables which are used to detect and characterize drought conditions. In this study, it was tried to predict drought occurrence, based on the standard precipitation index (SPI), using k-nearest neighbor modeling. The model was tested by using precipitation data of Kerman, Iran. Results showed that the model gives reasonable predictions of drought situation in the region. Finally, the efficiency and precision of the model was quantified by some statistical coefficients. Appropriate values of the correlation coefficient (r=0.874), mean absolute error (MAE=0.106), root mean square error (RMSE=0.119) and coefficient of residual mass (CRM=0.0011) indicated that the present model is suitable and efficient
    Keywords: Drought monitoring, Standard precipitation index, Nearest neighbor model, Model evaluation
  • A. Moshar Movahhed, H. Toossian Shandiz *, Syed K. Hoseini Sani Pages 327-335
    In this paper fractional order averaged model for DC/DC Buck converter in continues condition mode (CCM) operation is established. DC/DC Buck converter is one of the main components in the wind turbine system which is used in this research. Due to some practical restriction there weren’t exist input voltage and duty cycle of converter therefor whole of the wind system was simulated in Matlab/Simulink and gathered data is used in proposed method based on trial and error in order to find the fractional order of converter. There is an obvious relationship between controller performance and mathematical model. More accurate model leads us to better controller.
    Keywords: Fractional calculus, Buck converter, Averaged circuit model, Wind turbine