Using Conceptual Clustering to Extraction of Key Phrases and Related Terms: A Case Study of Scientific Communication Texts

Message:
Article Type:
Case Study (دارای رتبه معتبر)
Abstract:

Scientific communication encompasses various types and forms of communication conducted through the use of communication methods and tools, aiming to exchange scientific knowledge and information. To gain a comprehensive understanding of scientific and research communications and enhance them, it is crucial to identify the terms and concepts. Therefore, the main objective of this research is to identify and conceptually cluster key terms in the field of scientific communication using text mining techniques. The present research method is quantitative in terms of approach and practical in terms of purpose and utilized various text mining techniques for identifying and clustering key terms in the field of scientific communication. The research population consist of abstracts of articles related to scientific communication, extracted from databases such as Web of Science and Scopus, totaling 558 articles. The sampling method was census. Initially, all nominal phrases were extracted using available libraries. Each compound phrase was decomposed into its constituent words, and based on GloVe dictionary, the average vectors of those words were calculated, assigning a numerical vector to each compound phrase. The researchers created an equivalent expression using existing vocabulary to describe unknown terms that did not exist in the GloVe dictionary. The clustering (using the K-means method) was performed on these vectors. The findings revealed that out of 17,930 extracted keywords, 13,651 terms were noun phrases. Also, 16% of terms in the field of scientific communication were single words and 84% of them were compound. After creating vectors of compound terms and performing clustering, 40 conceptual clusters were created from 792 phrases or terms in the field of scientific communication. After adjusting and removing weak clusters, researchers finally identified 22 clusters in the field of scientific communication. Identifying the concepts and components in scientific communication in the form of conceptual clusters and its elements is attributed to the results of this research. One of the most significant findings was the assignment of numerical vectors to composite phrases based on the vectors of their constituent words. These vectors were then used for clustering and categorizing phrases, as well as improving and correcting some clusters. This method pays attention to the semantics aspects and learning in the clustering and categorization of concepts and, will aid to precise analysis of key terms and phrases in various fields.

Language:
Persian
Published:
Journal of Information Processing and Management, Volume:39 Issue: 4, 2024
Pages:
1477 to 1506
https://www.magiran.com/p2778329  
سامانه نویسندگان
  • Shabani، Ahmad
    Corresponding Author (2)
    Shabani, Ahmad
    Professor Information and library Science,Faculty of Education and Psychology, University Of Isfahan, اصفهان, Iran
اطلاعات نویسنده(گان) توسط ایشان ثبت و تکمیل شده‌است. برای مشاهده مشخصات و فهرست همه مطالب، صفحه رزومه را ببینید.
مقالات دیگری از این نویسنده (گان)