Offering a model for persian texts classify by combination of classification methods

Message:
Article Type:
Research/Original Article (بدون رتبه معتبر)
Abstract:

To classify text information extraction techniques, natural language processing and machine learning has been widely used general purpose of categories of documents, classified documents in the form of a certain number of categories are pre-determined. Each document can be in one, several or no category is placed. In the case of any document to this question will be placed the document on which of the categories. This can be in the form of an automatic learning to use it any document can be automatically assigned to a category.     In this thesis, data collection and cleanup after you select text using the normal method of word frequency -inverse document frequency (norm TF-IDF) is the weight features and features in two stages using document frequency (DF) and Chi square (SChi) are selected, and then using principal component analysis (PCA) features reduced dimensions, and at a later stage by combining 21 support vector machine (SVM) the proposed model we have implemented, and the accuracy of the model to assess the 10-step method validation. Experimental results show that this model can text classification accuracy of 91.86 for the seven categories do, which has a higher accuracy than the earlier work done.

Language:
Persian
Published:
Journal of Southern Communication Engineering, Volume:10 Issue: 38, 2020
Pages:
61 to 72
https://www.magiran.com/p2417724