فهرست مطالب

Journal of Artificial Intelligence and Data Mining
Volume:3 Issue: 2, Summer-Autumn 2015

  • تاریخ انتشار: 1394/10/28
  • تعداد عناوین: 12
|
  • J. Hamidzadeh Pages 121-130
    In instance-based learning, a training set is given to a classifier for classifying new instances. In practice, not all information in the training set is useful for classifiers. Therefore, it is convenient to discard irrelevant instances from the training set. This process is known as instance reduction, which is an important task for classifiers since through this process the time for classification or training could be reduced. Instance-based learning methods are often confronted with the difficulty of choosing the instances which must be stored to be used during an actual test. Storing too many instances may result in large memory requirements and slow execution speed. In this paper, first, a Distance-based Decision Surface (DDS) is proposed which is used as a separating surface between the classes, then an instance reduction method, which is based on the DDS surface is proposed, namely IRDDS (Instance Reduction based on Distance-based Decision Surface). Using the DDS surface with Genetic algorithm selects a reference set for classification. IRDDS selects the most representative instances, satisfying both following
    Objectives
    high accuracy and reduction rates. The performance of IRDDS has been evaluated on real world data sets from UCI repository by the 10-fold cross-validation method. The results of the experiments are compared with some state-of-the-art methods, which show the superiority of the proposed method over the surveyed literature, in terms of both classification accuracy and reduction percentage.
    Keywords: Instance Reduction (IR), Distance, based Decision Surface (DDS), Instance, based Learning (IL), Support Vector Machine (SVM), Genetic Algorithm (GA)
  • A. Telikani, A. Shahbahrami, R. Tavoli Pages 131-140
    Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved against association rule mining method. This process strongly rely on the minimizing the impact of data sanitization on the data utility by minimizing the number of lost patterns in the form of non-sensitive patterns which are not mined from sanitized database. This study proposes a data sanitization algorithm to hide sensitive patterns in the form of frequent itemsets from the database while controls the impact of sanitization on the data utility using estimation of impact factor of each modification on non-sensitive itemsets. The proposed algorithm has been compared with Sliding Window size Algorithm (SWA) and Max-Min1 in term of execution time, data utility and data accuracy. The data accuracy is defined as the ratio of deleted items to the total support values of sensitive itemsets in the source dataset. Experimental results demonstrate that proposed algorithm outperforms SWA and Max-Min1 in terms of maximizing the data utility and data accuracy and it provides better execution time over SWA and Max-Min1 in high scalability for sensitive itemsets and transactions.
    Keywords: Data Sanitization, Association rule hiding, Frequent Itemsets, Association Rule Mining, Privacy preserving data mining
  • F. Alibakhshi, M. Teshnehlab, M. Alibakhshi, M. Mansouri Pages 141-147
    The stability of learning rate in neural network identifiers and controllers is one of the challenging issues which attracts great interest from researchers of neural networks. This paper suggests adaptive gradient descent algorithm with stable learning laws for modified dynamic neural network (MDNN) and studies the stability of this algorithm. Also, stable learning algorithm for parameters of MDNN is proposed. By proposed method, some constraints are obtained for learning rate. Lyapunov stability theory is applied to study the stability of the proposed algorithm. The Lyapunov stability theory is guaranteed the stability of the learning algorithm. In the proposed method, the learning rate can be calculated online and will provide an adaptive learning rare for the MDNN structure. Simulation results are given to validate the results.
    Keywords: Gradient Descent Algorithm, Identifier, Learning Rate, Lyapunov Stability Theory
  • D. Darabian, H. Marvi, M. Sharif Noughabi Pages 149-156
    The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to the noisy original speech signal. The pre-emphasized original speech segmented into overlapping time frames, then it is windowed by a modified hamming window .Higher order autocorrelation coefficients are extracted. The next step is to eliminate the lower order of the autocorrelation coefficients. The consequence pass from FFT block and then power spectrum of output is calculated. A Gaussian shape filter bank is applied to the results. Logarithm and two compensator blocks form which one is mean subtraction and the other one are root block applied to the results and DCT transformation is the last step. We use MLP neural network to evaluate the performance of proposed MFCC method and to classify the results. Some speech recognition experiments for various tasks indicate that the proposed algorithm is more robust than traditional ones in noisy condition.
    Keywords: MFCC, Autocorrelation, Gaussian Filter Bank, Root, Mean Normalization
  • V. Derhami, J. Paksima, H. Khajah Pages 157-168
    The main challenge of a search engine is ranking web documents to provide the best response to a user`s query. Despite the huge number of the extracted results for user`s query, only a small number of the first results are examined by users; therefore, the insertion of the related results in the first ranks is of great importance. In this paper, a ranking algorithm based on the reinforcement learning and user`s feedback called RL3F are considered. In the proposed algorithm, the ranking system has been considered to be the agent of learning system and selecting documents to display to the user is as the agents’ action. The reinforcement signal in the system is calculated according to a user`s clicks on documents. Action-value values of the proposed algorithm are computed for each feature. In each learning cycle, the documents are sorted out for the next query, and according to the document in the ranked list, documents are selected at random to show the user. Learning process continues until the training is completed. LETOR3 benchmark is used to evaluate the proposed method. Evaluation results indicated that the proposed method is more effective than other methods mentioned for comparison in this paper. The superiority of the proposed algorithm is using several features of document and user`s feedback simultaneously.
    Keywords: Search Engine, Ranking, Reinforcement Learning, User Feedback, Web Documents
  • E. Golpar, Rabooki, S. Zarghamifar, Jalal Rezaeenour Pages 169-179
    Opinion mining deals with an analysis of user reviews for extracting their opinions, sentiments and demands in a specific area, which can play an important role in making major decisions in such area. In general, opinion mining extracts user reviews at three levels of document, sentence and feature. Opinion mining at the feature level is taken into consideration more than the other two levels due to orientation analysis of different aspects of an area. In this paper, two methods are introduced for feature extraction. The recommended methods consist of four main stages. At the first stage, opinion-mining lexicon for Persian is created. This lexicon is used to determine the orientation of users’ reviews. The second one is the preprocessing stage including unification of writing, tokenization, creating parts-of-speech tagging and syntactic dependency parsing for documents. The third stage involves the extraction of features using two methods including frequency-based feature extraction and association rule based feature extraction. In the fourth stage, the features and polarities of the word reviews extracted in the previous stage are modified and the final feature's polarity is determined. To assess the suggested techniques, a set of user reviews in both scopes of university and cell phone areas were collected and the results of the two methods were compared.
    Keywords: Opinion Mining, Feature Extraction, Opinion, mining Lexicon, Corpus, Parts, of, speech Tagging, Syntactic Dependency Parsing
  • M. Imani, H. Ghassemian Pages 181-190
    Hyperspectral sensors provide a large number of spectral bands. This massive and complex data structure of hyperspectral images presents a challenge to traditional data processing techniques. Therefore, reducing the dimensionality of hyperspectral images without losing important information is a very important issue for the remote sensing community. We propose to use overlap-based feature weighting (OFW) for supervised feature extraction of hyperspectral data. In the OFW method, the feature vector of each pixel of hyperspectral image is divided to some segments. The weighted mean of adjacent spectral bands in each segment is calculated as an extracted feature. The less the overlap between classes is, the more the class discrimination ability will be. Therefore, the inverse of overlap between classes in each band (feature) is considered as a weight for that band. The superiority of OFW, in terms of classification accuracy and computation time, over other supervised feature extraction methods is established on three real hyperspectral images in the small sample size situation.
    Keywords: Class Discrimination, Overlap, Feature Weighting, Feature Extraction, Hyperspectral
  • Z. Amiri, A. Pouyan, H. Mashayekhi Pages 191-201
    Recently, data collection from seabed by means of underwater wireless sensor networks (UWSN) has attracted considerable attention. Autonomous underwater vehicles (AUVs) are increasingly used as UWSNs in underwater missions. Events and environmental parameters in underwater regions have a stochastic nature. The target area must be covered by sensors to observe and report events. A ‘topology control algorithm’ characterizes how well a sensing field is monitored and how well pairs of sensors are mutually connected in UWSNs. It is prohibitive to use a central controller to guide AUVs’ behavior due to ever changing, unknown environmental conditions, limited bandwidth and lossy communication media. In this research, a completely decentralized three-dimensional topology control algorithm for AUVs is proposed. It is aimed at achieving maximal coverage of the target area. The algorithm enables AUVs to autonomously decide on and adjust their speed and direction based on the information collected from their neighbors. Each AUV selects the best movement at each step by independently executing a Particle Swarm Optimization (PSO) algorithm. In the fitness function, the global average neighborhood degree is used as the upper limit of the number of neighbors of each AUV. Experimental results show that limiting number of neighbors for each AUV can lead to more uniform network topologies with larger coverage. It is further shown that the proposed algorithm is more efficient in terms of major network parameters such as target area coverage, deployment time, and average travelled distance by the AUVs.
    Keywords: underwater sensor networks, AUV, PSO Algorithm, three, dimensional topology control, distributed artificial intelligence
  • A. Khazaei, M. Ghasemzadeh Pages 203-208
    This paper compares clusters of aligned Persian and English texts obtained from k-means method. Text clustering has many applications in various fields of natural language processing. So far, much English documents clustering research has been accomplished. Now this question arises, are the results of them extendable to other languages? Since the goal of document clustering is grouping of documents based on their content, it is expected that the answer to this question is yes. On the other hand, many differences between various languages can cause the answer to this question to be no. This research has focused on k-means that is one of the basic and popular document clustering methods. We want to know whether the clusters of aligned Persian and English texts obtained by the k-means are similar. To find an answer to this question, Mizan English-Persian Parallel Corpus was considered as benchmark. After features extraction using text mining techniques and applying the PCA dimension reduction method, the k-means clustering was performed. The morphological difference between English and Persian languages caused the larger feature vector length for Persian. So almost in all experiments, the English results were slightly richer than those in Persian. Aside from these differences, the overall behavior of Persian and English clusters was similar. These similar behaviors showed that results of k-means research on English can be expanded to Persian. Finally, there is hope that despite many differences between various languages, clustering methods may be extendable to other languages.
    Keywords: Clustering, Mizan English, Persian Parallel Corpus, K, means, Principal Component Analysis (PCA)
  • R. Satpathy, V. B. Konkimalla, J. Ratha Pages 209-215
    The present work was designed to classify and differentiate between the dehalogenase enzyme to non–dehalogenases (other hydrolases) by taking the amino acid propensity at the core, surface and both the parts. The data sets were made on an individual basis by selecting the 3D structures of protein available in the PDB (Protein Data Bank). The prediction of the core amino acid were predicted by IPFP tool and their structural propensity calculation was performed by an in-house built software, Propensity Calculator which is available online. All datasets were finally grouped into two categories namely, dehalogenase and non-dehalogenase using Naïve Bayes, J-48, Random forest, K-means clustering and SMO classification algorithm. By making the comparison of various classification methods, the proposed tree method (Random forest) performs well with a classification accuracy of 98.88 % (maximum) for the core propensity data set. Therefore we proposed that, the core amino acid propensity could be approved as a novel potential descriptor for the classification of enzymes.
    Keywords: Core Propensity, Classification Algorithm, Random Forest, Protein Data Bank, Dehalogenase, Non, dehalogenases
  • M. Aghaei, A. Dastfan Pages 217-224
    The harmonic in distribution systems becomes an important problem due to an increase in nonlinear loads. This paper presents a new approach based on a graph algorithm for optimum placement of passive harmonic filters in a multi-bus system, which suffers from harmonic current sources. The objective of this paper is to minimize the network loss, the cost of the filter and the total harmonic distortion of voltage, and also enhances voltage profile at each bus effectively. Four types of sub-graph have been used for search space of optimization. The method handles standard capacitor sizes in planning filters and associated costs. In this paper, objective function is not differential but eases solving process. The IEEE 30 bus test system is used for the placement of passive filter. The simulation has been done to show applicability of the proposed method. Simulation results prove that the method is effective and suitable for the passive filter planning in a power system.
    Keywords: Harmonics, Passive Filter, Optimization, Graph Algorithm
  • R. Ghanizadeh, M. Ebadian Pages 225-234
    This paper presents a new control method for a three-phase four-wire Unified Power Quality Conditioner (UPQC) to deal with the problems of power quality under distortional and unbalanced load conditions. The proposed control approach is the combination of instantaneous power theory and Synchronous Reference Frame (SRF) theory which is optimized by using a self-tuning filter (STF) and without using load or filter currents measurement. In this approach, load and source voltages are used to generate the reference voltages of series active power filter (APF) and source currents are used to generate the reference currents of shunt APF. Therefore, the number of current measurements is reduced and system performance is improved. The performance of proposed control system is tested for cases of power factor correction, reducing source neutral current, load balancing and current and voltage harmonics in a three-phase four-wire system for distortional and unbalanced loads. Results obtained through MATLAB/SIMULINK software show the effectiveness of proposed control technique in comparison to the conventional p-q method.
    Keywords: Power Quality, Unified Power Quality Conditioner, Voltage Harmonic Mitigation, Current Harmonic Mitigation, Source Neutral Current Mitigation