فهرست مطالب

Artificial Intelligence and Data Mining - Volume:6 Issue: 2, Summer-Autumn 2018

Journal of Artificial Intelligence and Data Mining
Volume:6 Issue: 2, Summer-Autumn 2018

  • تاریخ انتشار: 1396/12/27
  • تعداد عناوین: 18
|
  • M. Amin-Naji, A. Aghagolzadeh * Pages 233-250
    The purpose of multi-focus image fusion is gathering the essential information and the focused parts from the input multi-focus images into a single image. These multi-focus images are captured with different depths of focus of cameras. A lot of multi-focus image fusion techniques have been introduced using considering the focus measurement in the spatial domain. However, the multi-focus image fusion processing is very time-saving and appropriate in discrete cosine transform (DCT) domain, especially when JPEG images are used in visual sensor networks (VSN). So the most of the researchers are interested in focus measurements calculation and fusion processes directly in DCT domain. Accordingly, many researchers developed some techniques which are substituting the spatial domain fusion process with DCT domain fusion process. Previous works in DCT domain have some shortcomings in selection of suitable divided blocks according to their criterion for focus measurement. In this paper, calculation of two powerful focus measurements, energy of Laplacian (EOL) and variance of Laplacian (VOL), are proposed directly in DCT domain. In addition, two other new focus measurements which work by measuring correlation coefficient between source blocks and artificial blurred blocks are developed completely in DCT domain. However, a new consistency verification method is introduced as a post-processing, improving the quality of fused image significantly. These proposed methods reduce the drawbacks significantly due to unsuitable block selection. The output images quality of our proposed methods is demonstrated by comparing the results of proposed algorithms with the previous algorithms.
    Keywords: Image Fusion, Multi, Focus, Visual Sensor Networks, discrete cosine transform, Variance, Energy of Laplacian
  • A.M. Esmilizaini, A.M. Latif *, Gh. Barid Loghmani Pages 251-262
    Image zooming is one of the current issues of image processing where maintaining the quality and structure of the zoomed image is important. To zoom an image, it is necessary that the extra pixels be placed in the data of the image. Adding the data to the image must be consistent with the texture in the image and not to create artificial blocks. In this study, the required pixels are estimated by using radial basis functions and calculating the shape parameter c with genetic algorithm. Then, all the estimated pixels are revised based on the sub-algorithm of edge correction. The proposed method is a non-linear method that preserves the edges and minimizes the blur and block artifacts of the zoomed image. The proposed method is evaluated on several images to calculate the optimum shape parameter of radial basis functions. Numerical results are presented by using PSNR and SSIM fidelity measures on different images and are compared to some other methods. The average PSNR of the original image and image zooming is 33.16 which shows that image zooming by factor 2 is similar to the original image, emphasizing that the proposed method has an efficient performance.
    Keywords: Image zooming, Radial basis function, Genetic Algorithm, Interpolation
  • S. Miri Rostami *, M. Ahmadzadeh Pages 263-276
    Application of data mining methods as a decision support system has a great benefit to predict survival of new patients. It also has a great potential for health researchers to investigate the relationship between risk factors and cancer survival. But due to the imbalanced nature of datasets associated with breast cancer survival, the accuracy of survival prognosis models is a challenging issue for researchers. This study aims to develop a predictive model for 5-year survivability of breast cancer patients and discover relationships between certain predictive variables and survival. The dataset was obtained from SEER database. First, the effectiveness of two synthetic oversampling methods Borderline SMOTE and Density based Synthetic Oversampling method (DSO) is investigated to solve the class imbalance problem. Then a combination of particle swarm optimization (PSO) and Correlation-based feature selection (CFS) is used to identify most important predictive variables. Finally, in order to build a predictive model three classifiers decision tree (C4.5), Bayesian Network, and Logistic Regression are applied to the cleaned dataset. Some assessment metrics such as accuracy, sensitivity, specificity, and G-mean are used to evaluate the performance of the proposed hybrid approach. Also, the area under ROC curve (AUC) is used to evaluate performance of feature selection method. Results show that among all combinations, DSO PSO_CFS C4.5 presents the best efficiency in criteria of accuracy, sensitivity, G-mean and AUC with values of 94.33%, 0.930, 0.939 and 0.939, respectively.
    Keywords: breast cancer, survival, class imbalance problem, oversampling technique, Feature selection
  • M. Abdar, M. Zomorodi-Moghadam * Pages 277-285
    In this paper the accuracy of two machine learning algorithms including SVM and Bayesian Network are investigated as two important algorithms in diagnosis of Parkinson’s disease. We use Parkinson's disease data in the University of California, Irvine (UCI). In order to optimize the SVM algorithm, different kernel functions and C parameters have been used and our results show that SVM with C parameter (C-SVM) with average of 99.18% accuracy with Polynomial Kernel function in testing step, has better performance compared to the other Kernel functions such as RBF and Sigmoid as well as Bayesian Network algorithm. It is also shown that ten important factors in SVM algorithm are Jitter (Abs), Subject #, RPDE, PPE, Age, NHR, Shimmer APQ 11, NHR, Total-UPDRS, Shimmer (dB) and Shimmer. We also prove that the accuracy of our proposed C-SVM and RBF approaches is in direct proportion to the value of C parameter such that with increasing the amount of C, accuracy in both Kernel functions is increased. But unlike Polynomial and RBF, Sigmoid has an inverse relation with the amount of C. Indeed, by using these methods, we can find the most effective factors common in both genders (male and female). To the best of our knowledge there is no study on Parkinson's disease for identifying the most effective factors which are common in both genders.
    Keywords: Data mining, Parkinson's disease, SVM algorithm, Bayesian Network algorithm, C, SVM algorithm
  • Z. Sedighi *, R. Boostani Pages 287-295
    Although many studies have been conducted to improve the clustering efficiency, most of the state-of-art schemes suffer from the lack of robustness and stability. This paper is aimed at proposing an efficient approach to elicit prior knowledge in terms of must-link and cannot-link from the estimated distribution of raw data in order to convert a blind clustering problem into a semi-supervised one. To estimate the density distribution of data, Wiebull Mixture Model (WMM) is utilized due to its high flexibility. Another contribution of this study is to propose a new hill and valley seeking algorithm to find the constraints for semi-supervise algorithm. It is assumed that each density peak stands on a cluster center; therefore, neighbor samples of each center are considered as must-link samples while the near centroid samples belonging to different clusters are considered as cannot-link ones. The proposed approach is applied to a standard image dataset (designed for clustering evaluation) along with some UCI datasets. The achieved results on both databases demonstrate the superiority of the proposed method compared to the conventional clustering methods.
    Keywords: Semi, supervised, Clustering, Valley seeking scheme, Weibull mixture model (WMM)
  • Seyed M. Hosseinirad * Pages 297-311
    Due to the resource constraint and dynamic parameters, reducing energy consumption became the most important issues of wireless sensor networks topology design. All proposed hierarchy methods cluster a WSN in different cluster layers in one step of evolutionary algorithm usage with complicated parameters which may lead to reducing efficiency and performance. In fact, in WSNs topology, increasing a layer of cluster is a tradeoff between time complexity and energy efficiency. In this study, regarding the most important WSN’s design parameters, a novel dynamic multilayer hierarchy clustering approach using evolutionary algorithms for densely deployed WSN is proposed. Different evolutionary algorithms such as Genetic Algorithm (GA), Imperialist Competitive Algorithm (ICA) and Particle Swarm Optimization (PSO) are used to find an efficient evolutionary algorithm for implementation of the clustering proposed method. The obtained results demonstrate the PSO performance is more efficient compared with other algorithms to provide max network coverage, efficient cluster formation and network traffic reduction. The simulation results of multilayer WSN clustering design through PSO algorithm show that this novel approach reduces the energy communication significantly and increases lifetime of network up to 2.29 times with providing full network coverage (100%) till 350 rounds (56% of network lifetime) compared with WEEC and LEACH-ICA clsutering.
    Keywords: Wireless sensor networks, cluster head, Genetic Algorithm, Imperialist Competitive Algorithm, Network Lifetime
  • F. Hoseini *, A. Shahbahrami, A. Yaghoobi Notash Pages 313-319
    One of the most important and typical application of wireless sensor networks (WSNs) is target tracking. Although target tracking, can provide benefits for large-scale WSNs and organize them into clusters but tracking a moving target in cluster-based WSNs suffers a boundary problem. The main goal of this paper was to introduce an efficient and novel mobility management protocol namely Target Tracking Based on Virtual Grid (TTBVG), which integrates on-demand dynamic clustering into a cluster- based WSN for target tracking. This protocol converts on-demand dynamic clusters to scalable cluster-based WSNs, by using boundary nodes and facilitates sensors’ collaboration around clusters. In this manner, each sensor node has the probability of becoming a cluster head and apperceives the tradeoff between energy consumption and local sensor collaboration in cluster-based sensor networks. The simulation results of this study demonstrated that the efficiency of the proposed protocol in both one-hop and multi-hop cluster-based sensor networks.
    Keywords: target tracking, Virtual Grid, Clustering, Wireless sensor networks, Dynamic Clustering
  • Kh. Sadatnejad, S. Shiry Ghidari *, M. Rahmati Pages 321-334
    Kernel trick and projection to tangent spaces are two choices for linearizing the data points lying on Riemannian manifolds. These approaches are used to provide the prerequisites for applying standard machine learning methods on Riemannian manifolds. Classical kernels implicitly project data to high dimensional feature space without considering the intrinsic geometry of data points. Projection to tangent spaces truly preserves topology along radial geodesics. In this paper, we propose a method for extrinsic inference on Riemannian manifold using kernel approach while topology of the entire dataset is preserved. We show that computing the Gramian matrix using geodesic distances, on a complete Riemannian manifold with unique minimizing geodesic between each pair of points, provides a feature mapping which preserves the topology of data points in the feature space. The proposed approach is evaluated on real datasets composed of EEG signals of patients with two different mental disorders, texture, visual object classes, and tracking datasets. To assess the effectiveness of our scheme, the extracted features are examined by other state-of-the-art techniques for extrinsic inference over symmetric positive definite (SPD) Riemannian manifold. Experimental results show the superior accuracy of the proposed approach over approaches which use kernel trick to compute similarity on SPD manifolds without considering the topology of dataset or partially preserving topology.
    Keywords: Kernel trick, Riemannian manifold, Geometry preservation, Gramian matrix
  • N. Ashrafi Payaman, M.R. Kangavari * Pages 335-340
    One solution to process and analysis of massive graphs is summarization. Generating a high quality summary is the main challenge of graph summarization. In the aims of generating a summary with a better quality for a given attributed graph, both structural and attribute similarities must be considered. There are two measures named density and entropy to evaluate the quality of structural and attribute-based summaries respectively. For an attributed graph, a high quality summary is one that covers both graph structure and its attributes with the user-specified degrees of importance. Recently two methods has been proposed for summarizing a graph based on both graph structure and attribute similarities. In this paper, a new method for hybrid summarization of a given attributed graph has proposed and the quality of the summary generated by this method has compared with the recently proposed method for this purpose. Experimental results showed that our proposed method generates a summary with a better quality.
    Keywords: Graph Summarization, Super, node, Super, edge, Structural similarity, Attribute, based Similarity
  • H.R. Keshavarz, M. Saniee Abadeh * Pages 341-353
    In Web 2.0, people are free to share their experiences, views, and opinions. One of the problems that arises in web 2.0 is the sentiment analysis of texts produced by users in outlets such as Twitter. One of main the tasks of sentiment analysis is subjectivity classification. Our aim is to classify the subjectivity of Tweets. To this end, we create subjectivity lexicons in which the words into objective and subjective words. To create these lexicons, we make use of three metaheuristic methods. We extract two meta-level features, which show the count of objective and subjective words in tweets according to the lexicons. We then classify the tweets based on these two features. Our method outperforms the baselines in terms of accuracy and f-measure. In the three metaheuristics, it is observed that genetic algorithm performs better than simulated annealing and asexual reproduction optimization, and it also outperforms all the baselines in terms of accuracy in two of the three assessed datasets. The created lexicons also give insight about the objectivity and subjectivity of words.
    Keywords: Evolutionary Computation, genetic algorithms, Natural Language Processing, Prediction Methods, Sentiment Analysis
  • A. M. Mousavi *, M. Khodadadi Pages 355-363
    Usually, important parameters in the design and implementation of combinational logic circuits are the number of gates, transistors, and the levels used in the design of the circuit. In this regard, various evolutionary paradigms with different competency have recently been introduced. However, while being advantageous, evolutionary paradigms also have some limitations including: a) lack of confidence in reaching at the correct answer, b) long convergence time, and c) restriction on the tests performed with higher number of input variables. In this paper, we have implemented a genetic programming approach that given a Boolean function, outputs its equivalent circuit such that the truth table is covered and the minimum number of gates (and to some extent transistors and levels) are used. Furthermore, our implementation improves the aforementioned limitations by: Incorporating a self-repairing feature (improving limitation a); Efficient use of the conceivable coding space of the problem, which virtually brings about a kind of parallelism and improves the convergence time (improving limitation b). Moreover, we have applied our method to solve Boolean functions with higher number of inputs (improving limitation c). These issues are verified through multiple tests and the results are reported.
    Keywords: Genetic Programming, Logical Circuits, Design Optimization
  • M. Kosari *, M. Teshnehlab Pages 365-373
    Although many mathematicians have searched on the fractional calculus since many years ago, but its application in engineering, especially in modeling and control, does not have many antecedents. Since there are much freedom in choosing the order of differentiator and integrator in fractional calculus, it is possible to model the physical systems accurately. This paper deals with time-domain identification fractional-order chaotic systems where conventional derivation is replaced by a fractional one with the help of a non-integer derivation. This operator is itself approximated by a N-dimensional system composed of an integrator and a phase-lead filter. A hybrid particle swarm optimization (PSO) and genetic algorithm (GA) method has been applied to estimate the parameters of approximated nonlinear fractional-order chaotic system that modeled by a state-space representation. The feasibility of this approach is demonstrated through identifying the parameters of approximated fractional-order Lorenz chaotic system. The performance of the proposed algorithm is compared with the genetic algorithm (GA) and standard particle swarm optimization (SPSO) in terms of parameter accuracy and cost function. To evaluate the identification accuracy, the time-domain output error is designed as the fitness function for parameter optimization. Simulation results show that the proposed method is more successful than other algorithms for parameter identification of fractional order chaotic systems.
    Keywords: Parameter identification, chaotic system, Particle Swarm Optimization, Genetic Algorithm, Fractional calculus
  • M. A. Saadtjoo, S. M. Babamir * Pages 375-385
    Search-based optimization methods have been used for software engineering activities such as software testing. In the field of software testing, search-based test data generation refers to application of meta-heuristic optimization methods to generate test data that cover the code space of a program. Automatic test data generation that can cover all the paths of software is known as a major challenge.
    The paper establishes a new cost function for automatic test data generation, which can traverse the non-iterative paths of software control flow graphs. This function is later compared with similar cost functions proposed in other articles. The results indicate the superior performance of the proposed function. Still another innovation in this paper is the application of the Imperialist Competitive Algorithm in automatic test data generation along with the proposed cost function. Automatic test data generation is implemented through the Imperialist Competitive Algorithm as well as the Genetic and Particle Swarm Optimization Algorithms for three software programs with different search space sizes. The algorithms are compared with each other in terms of convergence speed, computational time, and local search. Test data generated by the proposed method has achieved better results than other algorithms in finding the number of non-iterative paths, the convergence speed and computational time with growing the searching space of the software's control flow graph.
    Keywords: software testing, Imperialist Competitive Algorithm (ICA), test data generation, Control Flow Graph (CFG), program complexity
  • M. Rezvani * Pages 387-397
    Cloud computing has become an attractive target for attackers as the mainstream technologies in the cloud, such as the virtualization and multitenancy, permit multiple users to utilize the same physical resource, thereby posing the so-called problem of internal facing security. Moreover, the traditional network-based intrusion detection systems (IDSs) are ineffective to be deployed in the cloud environments. This is because that such IDSs employ only the network information in their detection engine and this, therefore, makes them ineffective for the cloud-specific vulnerabilities. In this paper, we propose a novel assessment methodology for anomaly-based IDSs for cloud computing which takes into account both network and system-level information for generating the evaluation dataset. In addition, our approach deploys the IDS sensors in each virtual machine in order to develop a cooperative anomaly detection engine. The proposed assessment methodology is then deployed in a testbed cloud environment to generate an IDS dataset which includes both network and system-level features. Finally, we evaluate the performance of several machine learning algorithms over the generated dataset. Our experimental results demonstrate that the proposed IDS assessment approach is effective for attack detection in the cloud as most of the algorithms are able to identify the attacks with a high level of accuracy.
    Keywords: intrusion detection system, cloud computing, Classification, dataset generation, IDS assessment
  • A. Pouramini *, S. Khaje Hassani, Sh. Nasiri Pages 399-407
    In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text features such as textual delimiters, keywords, constants or text patterns, which we call handles, to construct patterns for the target data regions and data records. We offer a polynomial algorithm, in which these patterns are checked against the page elements in a mixed bottom-up and top-down traverse of the DOM-tree. The extracted data is directly mapped onto a hierarchical XML structure, which forms the output of the wrapper. The wrappers that are generated by this method are robust and independent of the HTML structure. Therefore, they can be adapted to similar websites to gather and integrate information.
    Keywords: Web Data Record Extraction, Web Wrapper Generation, Web Information Extraction
  • M. Aghazadeh, F. Soleimanian Gharehchopogh * Pages 409-415
    The size and complexity of websites have grown significantly during recent years. In line with this growth, the need to maintain most of the resources has been intensified. Content Management Systems (CMSs) are software that was presented in accordance with increased demands of users. With the advent of Content Management Systems, factors such as: domains, predesigned module’s development, graphics, optimization and alternative support have become factors that influenced the cost of software and web-based projects. Consecutively, these factors have challenged the previously introduced cost estimation models. This paper provides a hybrid method in order to estimate the cost of websites designed by content management systems. The proposed method uses a combination of genetic algorithm and Multilayer Perceptron (MLP). Results have been evaluated by comparing the number of correctly classified and incorrectly classified data and Kappa coefficient, which represents the correlation coefficient between the sets. According to the obtained results, the Kappa coefficient on testing data set equals to: 0.82 percent for the proposed method, 0.06 percent for genetic algorithm and 0.54 percent for MLP Artificial Neural Network (ANN). Based on these results; it can be said that, the proposed method can be used as a considered method in order to estimate the cost of websites designed by content management systems.
    Keywords: Genetic Algorithm, Multi, Layer Perceptron Artificial Neural Network, Website Cost Estimation, Content Management System
  • F. Moslehi, A.R. Haeri *, A.R. Moini Pages 417-437
    In today's world, most financial transactions are carried out using done through electronic instruments and in the context of the Information Technology and Internet. Disregarding the application of new technologies at this field and sufficing to traditional ways, will result in financial loss and customer dissatisfaction. The aim of the present study is surveying and analyzing the use of electronic payment instruments in banks across the country using statistics and information retrieved from the Central Bank and data mining techniques. For this purpose, firstly, according to the volume of transactions carried out and with the help of using the K-Means algorithm, a label was dedicated to any record; then hidden patterns of the E-payment instruments transaction were detected using the CART algorithm. The obtained results of this study enable banks administrators to balance their future policies in the field of E-payment and in the bank and customers’ interest's direction based on detected patterns and provide higher quality services to their customers.
    Keywords: Banking, Data mining, Electronic payment instruments, Classification, CRISP, DM
  • A. Jalalkamali *, N. Jalalkamali Pages 439-445
    The prediction of groundwater quality is very important for the management of water resources and environmental activities. The present study has integrated a number of methods such as Geographic Information Systems (GIS) and Artificial Intelligence (AI) methodologies to predict groundwater quality in Kerman plain (including HCO3-, concentrations and Electrical Conductivity (EC) of groundwater). This research has investigated the abilities of Adaptive Neuro Fuzzy Inference System (ANFIS), the hybrid of ANFIS with Genetic Algorithm (GA), and Artificial Neural Network (ANN) techniques as well to predict the groundwater quality. Various combinations of monthly variability, namely rainfall and groundwater levels in the wells were used by two different neuro-fuzzy models (standard ANFIS and ANFIS-GA) and ANN. The results show that the ANFIS-GA method can present a more parsimonious model with a less number of employed rules (about 300% reduction in number of rules) compared to ANFIS model and improve the fitness criteria and so model efficiency at the same time (38.4% in R2 and 44% in MAPE). The study also reveals that groundwater level fluctuations and rainfall contribute as two important factors in predicting indices of groundwater quality.
    Keywords: Groundwater quality, GIS, Genetic Algorithm, Neuro, Fuzzy, ANN