The Role of Ontology and Knowledge Graph in Text Document Classification: A Review of Studies

Message:
Article Type:
Review Article (دارای رتبه معتبر)
Abstract:
Purpose

With the increasing use of the internet and the growing volume of electronically accessible documents on the web, automatic text classification has become a critical method for enhancing information retrieval and managing digital text collections. Text classification allows individuals to search for and retrieve information more accurately and quickly. The significance of automatic document classification lies in labeling documents into predefined classes so that documents within a class exhibit the highest similarity and the most remarkable dissimilarity with documents from other classes while utilizing semantic relationships. This study investigates the application of ontology and knowledge graphs in automatic text document classification.

Method

This study reviewed research and documents related to applying semantic tools such as ontologies and knowledge graphs in text document classification. To collect texts, three domestic databases, including the "National Journal Database," the "Scientific Information Database of Jihad University," and "Marefate Danesh," along with three internal databases "Magiran," "SID" and "Civilica" and three external citation databases, such as "Web of Science", "Scopus" and "Google Scholar" It has been examined in both categories, regardless of the period.

Findings

Results of text exploration show that the vector space model does not consider the semantic relationships between words and disregards the word order in sentences. Neglecting the semantic and syntactic relationships between words in natural language provides a different representation of documents. However, ontologies and knowledge graphs help strengthen machine learning models by capturing the meaning of entities and classes. These tools act as an external reference during the classification process and provide domain knowledge for classification models. Using these tools generally allows machines to comprehend the meaning of the data they work with.

Conclusion

The application of ontologies and knowledge graphs in classifying textual documents can strengthen the results of machine learning algorithms through background knowledge. These tools can free the meanings of words from ambiguous sentences and solve problems related to natural language. Using ontology and knowledge graphs can effectively help classify textual documents and improve the accuracy and efficiency of classification models. However, constructing and integrating ontologies and knowledge graphs is a tedious, time-consuming, and complex task that limits the feasibility and practical application of these tools. In the Persian language, in addition to the problems raised in the application of ontologies and knowledge graphs in the classification of documents, there are limitations such as the specific features of the language in writing and technical limitations. Therefore, the use of ontology and knowledge graphs in discussing the classification of textual documents requires attention to linguistic limitations and technical complexity, and the need for further development and efforts is felt, especially in Persian.

Language:
Persian
Published:
Librarianship and Informaion Organization Studies, Volume:35 Issue: 2, 2024
Pages:
167 to 196
https://www.magiran.com/p2778318  
سامانه نویسندگان
  • Mansouri، Ali
    Author (3)
    Mansouri, Ali
    Associate Professor Knowledge and Information Science, University Of Isfahan, اصفهان, Iran
اطلاعات نویسنده(گان) توسط ایشان ثبت و تکمیل شده‌است. برای مشاهده مشخصات و فهرست همه مطالب، صفحه رزومه را ببینید.
مقالات دیگری از این نویسنده (گان)