جستجوی مقالات مرتبط با کلیدواژه

topic modeling

در نشریات گروه کتابداری و مدیریت اطلاعات

تکرار جستجوی کلیدواژه topic modeling در نشریات گروه علوم انسانی

انتخاب همه

تحلیل روند پژوهش های جهان در حوزه فناوری اطلاعات با استفاده از روش مدل سازی موضوعی

آرمان ساجدی نژاد*، محمد ربیعی

نشریه مطالعات کتابداری و علم اطلاعات، سال شانزدهم شماره 2 (پیاپی 48، تابستان 1403)، صص 1 -20

هدف

پژوهش حاضر با هدف شناسایی و تحلیل پژوهش های حوزه فناوری اطلاعات، استخراج موضوعات آن و ارائه اطلاعات علم سنجی مربوط به این موضوعات انجام شده است.

روش شناسی:

در این مقاله، موضوعات حوزه ی فناوری اطلاعات استخراج و ارتباطات بین واژگان پرکاربرد و تکامل زمانی آن ها تجزیه و تحلیل و در نهایت دسته بندی شده است. برای این منظور از مدل سازی موضوعی که روشی شناخته شده برای خوشه بندی اطلاعات متنی است استفاده شد.

یافته ها

روندهای ده ساله تغییرات کلیدواژه ها در مجموعه مطالعات، استخراج شد و پس از خوشه بندی مقالات، کلیدواژه های مهم هر خوشه استخراج شد. مقالات استخراج شده در حوزه فناوری اطلاعات به 8 دسته موضوعی تقسیم شدند که از موضوعات سخت افزار، ارتباطات و شبکه تا موضوعات کاربردهای هوشمند مانند اینترنت اشیا را شامل می شد. مشخص شد که کلیدواژه های پرکاربرد در این منابع همواره در گذر زمان در حال تغییر بوده است.

نتیجه گیری

در زمینه فناوری اطلاعات موضوعات در حال ترکیب و در بین دسته های مختلف فناوری اطلاعات در حال تغییر است. باتوجه به تغییر سمت و سو پژوهش های این حوزه از سخت افزار و ارتباطات به سمت کاربردها و تحلیل، به نظر می رسد زمینه های علمی آتی شامل کاربردهای روزمره و ایجادکننده ارزش افزوده با توجه به تحلیل داده و ارتباطات بین انسان و ماشین شکل خواهند گرفت و کاربردهای فناوری اطلاعات در علوم دیگر نمایان تر شده است. همچنین تمرکز بر پژوهش هایی که کمتر جنبه توسعه ای داشته و بیشتر کاربردی هستند و یا ارزش افزوده بیشتری برروی زیرساخت های موجود ایجاد نموده اند در این تغییرات مشهود است.

کلید واژگان: فناوری اطلاعات، مدل سازی موضوعی، تحلیل متون، علم سنجی

چکیده مشاهده متن مقاله پژوهشی/اصیل زبان: فارسی

Analysis of information technology research trends using topic modeling

Arman Sajedinejad *, Mohammad Rabiei

Journal of Studies in Library and Information Science, Volume:16 Issue: 2, 2024, PP 1 -20

Background and Objectives

IT's rapid progress and far-reaching impact on other scientific disciplines have not only necessitated significant changes in its own subjects but have also catalyzed extensive changes in the form, amount, and methodology of research in other fields. The objective of the present investigation was to analyze research conducted in the realm of information technology, extract its central themes, and furnish scientometric data pertaining to these themes.

Methodology

This paper explores the topics of the information technology field by extracting and categorizing the relationships between commonly used terms and their temporal evolution. To achieve this, the researchers employed topic modeling, a well-established method for clustering textual data. Topic modeling algorithms utilize statistical methods to analyze and interpret the primary words in documents, allowing for the examination of the presented issues and their interconnections and changes over time. Considering the rapid changes in the field of information technology, this paper drew upon materials spanning the last decade, including 10,000 papers sourced from top-tier journals featured in the Web of Science database.

Findings

The study extracted trends in keyword changes over the past decade and identified important keywords for each paper cluster after grouping them. The papers within the information technology domain were then categorized into eight themes, including hardware, communications, networks, and intelligent applications such as the Internet of Things. The study found that frequently used keywords have been continuously changing over time. The paper highlights that emerging keywords, including the Internet of Things, cloud computing, and Big Data, along with work areas such as Machine Learning and Deep Learning, are shaping the definition of information technology fields in the new era.

Discussion

Given the shift in research emphasis from hardware and communication to analysis and practical applications, it is likely that future scientific fields will focus on creating value through data analysis and human-machine communication in everyday applications, and information technology's relevance in other sciences will become more apparent. Future research can also concentrate on comparing global trends in information technology with domestic research, enabling the evaluation of the gap between the country's research and that of the world.

Keywords: Information Technology, Topic Modeling, Text Analysis, Scientometrics

Abstract View Paper Research/Original Article Original: Persian
تحلیل احساسات کاربران شبکه اجتماعی توییتر در مورد تکنولوژی چت جی پی تی

آمنه خدیور*، پریا عمان، فاطمه عباسی

نشریه مدیریت اطلاعات، سال نهم شماره 1 (پیاپی 16، بهار و تابستان 1402)، صص 179 -202

در سال های اخیر، توسعه هوش مصنوعی بر جنبه های مختلف زندگی بشر تاثیر چشمگیری داشته است. یکی از مهم ترین کاربردهای هوش مصنوعی، استفاده از چت بات هاست که چت جی پی تی، به عنوان یکی از معروف ترین آن ها، تغییر در نحوه تعامل انسان با فناوری را وعده می دهد. با گسترش استفاده از این نوع فناوری، نگرانی هایی درباره حریم خصوصی و امنیت داده ها پدیدار شده است. ارزیابی این نگرانی ها می تواند به ارائه بینش هایی ارزشمند در زمینه ادراک عمومی و بهبود سیاست های حریم خصوصی کمک کند. در حالی که پژوهش های قبلی بیشتر به جنبه های فنی چت جی پی تی پرداخته اند، بررسی احساسات عمومی به این فناوری تحول آفرین می تواند به ارزیابی موفقیت یا شکست آن و شناسایی قوت ها و ضعف ها کمک کند. در این پژوهش، هدف بررسی درک کاربران شبکه اجتماعی توییتر نسبت به چت جی پی تی از طریق تحلیل احساسات و مدل سازی موضوع است. ابتدا 478,266 توییت از طریق رابط کاربری رسمی توییتر جمع آوری شد. تحلیل احساسات با استفاده از مدل BERT یکی از پیشرفته ترین الگوریتم های یادگیری عمیق، انجام شد و دقت 82 درصد به دست آمد. همچنین، مدل سازی موضوع با استفاده از الگوریتم BERTopic مبتنی بر BERT با انسجام 632/0 (C_V) و انسجام 957/2- (U_Mass) انجام شد. نتایج پژوهش نشان می دهد که نه موضوع اصلی مورد بحث کاربران عبارت اند از: هوش مصنوعی، موتورهای جست‎وجو، مشاغل آینده، پاسخ دهی به سوال ها، آموزش، برنامه نویسی، مدل های زبان بزرگ، کسب وکار و سلامت. بر اساس یافته ها، کاربران نسبت به موضوعات مدل های زبان بزرگ، آموزش و کسب وکار احساسات مثبت بیشتری ابراز کرده اند، در حالی که موضوعات مشاغل آینده، سلامت و هوش مصنوعی بیشتر با احساسات منفی مواجه شده اند. با وجود اینکه نظرهای خنثی بیشترین درصد را در داده ها به خود اختصاص داده است، تعداد توییت های مثبت به طور چشمگیری بیشتر از توییت های منفی بوده است که رضایت و امیدواری عمومی را از فناوری چت جی پی تی نشان می دهد.

کلید واژگان: تحلیل احساسات، مدل سازی موضوع، چت جی پی تی، توییتر، مدل برت

چکیده مشاهده متن مقاله پژوهشی/اصیل زبان: فارسی

Sentiment Analysis of Twitter about ChatGPT

Ameneh Khadivar *, Paria Oman, Fatemeh Abbasi

Information management, Volume:9 Issue: 1, 2023, PP 179 -202

In recent years, we have witnessed significant advancements in artificial intelligence across many aspects of human life. One way AI can enhance human life is through the use of chatbots. A chatbot that has recently been introduced with much attention and is promised to revolutionize the way people interact with technology is ChatGPT. However, with the widespread use of AI chatbots, concerns about data privacy and security have emerged. Evaluating these concerns can offer insights into public perceptions and help improve data privacy policies. Previous research on this technology has mainly focused on its technical aspects, whereas understanding public sentiment about ChatGPT as a transformative technology can provide insights into its potential success or failure, as well as its strengths and weaknesses. In line with this, the present study aims to examine the perceptions of Twitter users regarding ChatGPT through sentiment analysis and topic modeling. A total of 478,266 tweets were collected via the official Twitter API, and following sentiment analysis using the BERT model—one of the advanced algorithms in deep learning—the results showed an accuracy of 82%. Additionally, through topic modeling using the BERTopic algorithm, based on BERT, the results achieved a coherence (C_V) score of 0.632 and a U_Mass score of -2.957. According to the study’s findings, the nine most discussed topics among Twitter users are: artificial intelligence, search engines, future jobs, answering questions, education, programming, large language models, business, and healthcare. The results indicate that users expressed the highest percentage of positive sentiment towards the topics of large language models, education, and business, while the most negative sentiments were expressed regarding future jobs, healthcare, and artificial intelligence. After neutral opinions, which made up the largest portion of the data, positive tweets significantly outnumbered negative ones, reflecting the public’s satisfaction and optimism towards ChatGPT technology.

Keywords: Sentiment Analysis, Topic Modeling, Chatgpt, Twitter, BERT Model

Abstract View Paper Research/Original Article Original: Persian
Heuristical Research on Twelve Decades of Information and Knowledge Creation Utilizing Python and NLP

Farshid Danesh, Meisam Dastani *

International Journal of Information Science and Management, Volume:21 Issue: 3, Summer 2023, PP 363 -382

Publications on knowledge and information creation have grown significantly due to their importance in information and knowledge management. This study aims to discover and analyze the hidden thematic topics of information and knowledge creation publications. The research applied was performed using text mining techniques and an analytical approach. The research population comprises publications on knowledge and information creation from 1900 to 2021, retrieved from the Web of Science Core Collection (WOSCC). The data were analyzed by Latent Dirichlet Allocation (LDA) algorithm and Python Programming Language. Forty-eight thousand two hundred sixty-five documents were retrieved and analyzed. "Data production," "Health seeking behavior," "Human Brain and Information processing," "Decision-making models," "Knowledge production," "Information needs," and "Digital Literacy" are among the essential topics in order of publication rate. The results indicated that the spectrum of the fourteen topics covered a variety of dimensions, including "data and knowledge creation," "information processing," "information needs and behavior," "digital literacy," and "critical thinking." The study's findings have shown the conceptual relationships between textual data and the presentation of the knowledge structure of information and knowledge creation. Based on this, it can be concluded that the creation of knowledge and information includes human mental and behavioral processes concerning knowledge.

Keywords: knowledge creation, Information Creation, Data Creation, Data Mining, text mining, Topic Modeling, Latent Dirichlet Allocation (LDA)

Abstract View Paper Research/Original Article Original: English
Bibliometric Analysis and Topic Modeling of Information Systems in Maternal Health Publications

Nadia Motamedi *, Javad Ghazimirsaeid, Fatemeh Sheikhshoaei, Mohammad Javad Mansourzadeh, Hossein Dehdarirad

International Journal of Information Science and Management, Volume:21 Issue: 2, Spring 2023, PP 85 -101

Due to the importance of maternal health for the development of society and the role of information systems in improving healthcare, this study aims to investigate and analyze the characteristics and topics of articles published in the field of information systems in maternal health. The articles were retrieved from the Web of Science (WoS) on October 23, 2021. The bibliometric indicators included the number of documents and citations, top journals, institutes, and countries. The co-authorship collaboration network of the countries was examined using Bibliometrix 3.1 package and VOSviewer software (ver. 1.6.17). In addition to bibliometric analysis, the related topic modelling was calculated with Non-Matrix Factorization (NMF) algorithm in Python programming language. Overall, 1140 original articles were published in the selected field in the WoS database within the years 1991-2021. The results demonstrated an ascending growth in the number of publications. The "The University of London", the "London School of Hygiene Tropical Medicine", and the "World Health Organization" (WHO) contributed the most to this field orderly. Researchers from the USA with 372 (32.63%), Brazil with 267 (23.42%), and England with 150 (13.2%) documents had the most scientific collaboration on publishing in this regard. The USA and England had the most collaboration in 38 articles in the co-authorship network of countries. Based on topic modelling analysis, five topic clusters, including "maternal mortality", "child and infant mortality", "risk factors related to pregnancy and maternal health", "Geographic Information Systems (GISs)", and "data quality in Health Information Systems (HISs)" were considered for this research. According to the research results, it can be concluded that there is a rising trend in the number of articles published in the field of information systems in maternal health. The USA, Brazil, and England have played a prominent role in scientific production in this regard. Given that this study gives a snapshot of the current status of the research topic and visualizes the collaboration between countries, the obtained results can guide future collaboration and encourage scientific institutes to expand their interactions.

Keywords: Bibliometrics, Topic Modeling, Maternal health, information systems

Abstract View Paper Research/Original Article Original: English
استخراج ماشینی کلیدواژه با مدل سازی موضوعی ال. دی. ای.: شباهت سنجی با کلیدواژه های استاندارد و ارزیابی کاربران

نصرت ریاحی نیا*، فرزانه شادان پور، کیوان برنا، غلامعلی منتظر

مجله تعامل انسان و اطلاعات، سال نهم شماره 3 (پاییز 1401)، صص 1 -22

زمینه و هدف :

هدف این پژوهش، بررسی نتایج استخراج خودکار کلیدواژه از فهرست مندرجات کتاب های الکترونیکی فارسی حوزه علوم با استفاده از مدل سازی موضوعی ال. دی. ای.، سنجش شباهت کلیدواژه های خروجی با کلیدواژه های استاندارد و ارزیابی کاربران از کلیدواژه های استخراج شده به صورت ماشینی است.

روش پژوهش:

این پژوهش کاربردی، از نوع پژوهش های متن کاوی و به جنبه روش های مورداستفاده در آن پژوهش آمیخته است. از مدل سازی موضوعی ال. دی. ای. برای استخراج کلیدواژه از فهرست های مندرجات کتاب ها استفاده شده و نتایج کاربرد مدل با دو روش سنجش کسینوس شباهت و پژوهش کیفی توسط کاربران مورد ارزیابی قرار گرفته است.

یافته ها

فهرست های مندرجات مورد بررسی با میانگین پیراسته 260.02 کلمه از متون با طول متوسط محسوب می شوند و حدود 20 درصد از کلمات آن ها را ایست واژه ها تشکیل داده اند. میان کلیدواژه های استاندارد سرعنوانی و کلیدواژه های خروجی مدل ال. دی. ای. کسینوس شباهت، 0.0932، بسیار پایین به دست آمد. توافق کامل کاربران نشان داد کلیدواژه های خروجی مدل موضوعی ال. دی. ای. حوزه موضوعی کل پیکره را نشان می دهند، اما ازنظر کاربران به ترتیب کلیدواژه های سرعنوانی استاندارد، کلیدواژه های مستخرج از مدل در زیرحوزه های موضوعی و کلیدواژه های مستخرج از مدل با کل پیکره در توصیف موضوعات هر تک مدرک موفق اند.

نتیجه گیری

کلیدواژه های به دست آمده از مدل موضوعی ال. دی. ای. را می توان در مجموعه های ناشناخته به منظور استخراج محتوای موضوعی ناآشکار کل مجموعه به کار برد، اما برای ربط دقیق موضوع به مدرک در پیکره های بزرگ با موضوعات ناهمگن و متنوع، نمی توان از این روش استفاده کرد. این روش در رویه های رسمی توصیف موضوعی تک تک مدارک به صورت مستقل می تواند به عنوان یک سیستم پیشنهاددهنده کلیدواژه به نیروی انسانی نمایه ساز به کار گرفته شود.

کلید واژگان: استخراج ماشینی کلیدواژه، مدل سازی موضوعی، ال. دی. ای.، شباهت سنجی، ارزیابی کاربر

چکیده مشاهده متن مقاله پژوهشی/اصیل زبان: فارسی

Automatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation

Nosrat Riahinia*, Farzaneh Shadanpour, Keyvan Borna, GholamAli Montazer

Human Information Interaction, Volume:9 Issue: 3, 2022, PP 1 -22

Purpose

This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with the golden standard, and users' viewpoints of the model keywords.

Methodology

This is mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of scientific e-books. The evaluation of the used approach has been done by two methods of cosine similarity computing and qualitative evaluation by users.

Findings

Table of contents are medium-length texts with a trimmed mean of 260.02 words, about 20% of which are stop-words. The cosine similarity between the golden standard keywords and the output keywords is 0.0932 thus very low. The full agreement of users showed that the extracted keywords with the LDA topic model represent the subject field of the whole corpus, but the golden standard keywords, the keywords extracted using the LDA topic model in sub-domains of the corpus, and the keywords extracted from the whole corpus were respectively successful in subject describing of each document.

Conclusion

The keywords extracted using the LDA topic model can be used in unspecified and unknown collections to extract hidden thematic content of the whole collection, but not to accurately relate each topic to each document in large and heterogeneous themes. In collections of texts in one subject field, such as mathematics or physics, etc., with less diversity and more uniformity in terms of the words used in them, more coherent and relevant keywords are obtained, but in these cases, the control of the relevance of keywords to each document is required. In formal subject analysis procedures and processes of individual documents, this approach can be used as a keyword suggestion system for indexing and analytical workforce.

Keywords: Keyword extraction, Topic modeling, Latent Dirichlet Allocation (LDA), Similarity evaluation, Users' evalua-tion

Abstract View Paper Research/Original Article Original: Persian
مدل سازی موضوعی مقالات پژوهشگران ایرانی در حوزه غدد درون ریز و متابولیسم در پایگاه استنادی وب علوم

ام البنین اسدی قادیکلایی، نجلا حریری*، مریم خادمی، فهیمه باب الحوائجی

پژوهشنامه علم سنجی، پیاپی 15 (بهار و تابستان 1401)، صص 49 -68

هدف

روش های مدل سازی موضوعات احتمالاتی متشکل از مجموعه ای از الگوریتم هایی است که هدف اصلی آنها کشف ساختار پنهان موضوعی در حجم وسیعی از اسناد است. هدف از انجام این پژوهش مدل سازی موضوعی مقالات پژوهشگران ایرانی در حوزه غدد درون ریز و متابولیسم در پایگاه استنادی وب علوم است.

روش شناسی

پژوهش حاضر از نوع کاربردی است که با روش متن کاوی و تحلیل محتوا به انجام رسیده است. در این پژوهش کلیه داده های مورد نیاز، از پایگاه استنادی وب علوم با استفاده از کلیدواژه های ثبت شده در سرعنوان موضوعی پزشکی بدون محدودیت زمانی تا 15 آبان 97 بازیابی شدند. سپس با استفاده از الگوریتم تخصیص پنهان دریکله مجموعه اسناد در محیط متلب تجزیه و تحلیل شدند.

یافته ها

دسته های موضوعی به صورت دسته هایی از 20 واژه و در 10 دسته موضوعی استخراج شدند. سپس توسط فوق تخصصان غدد دسته های موضوعی بر اساس ارتباط آنها به موضوعات مختلف حوزه غدد درون ریز و متابولیسم نام گذاری شدند و به هر دسته عنوان موضوعی اختصاص یافت.

نتیجه گیری

نتایج بیانگر این است که اجرای مدل تخصیص پنهان دریکله عملکرد قابل قبولی در ارایه دسته های موضوعات حوزه غدد داشته است. دسته های موضوعی استخراج شده دارای تجانس و ارتباط موضوعی خوبی با یکدیگر هستند.

کلید واژگان: غدد درون ریز و متابولیسم، مدل سازی موضوعی، تخصیص پنهان دریکله، متن کاوی، ایران

چکیده مشاهده متن مقاله پژوهشی/اصیل زبان: فارسی

Topic Modeling of Endocrinology and Metabolism Articles by Iranian Researchers in the Web of Science

Omolbanin Asadi Qadiklaei, Nadjla Hariri *, Maryam Khademi, Fahimeh Babalhavaeji

Scientometric research journal, Volume:8 Issue: 15, 2022, PP 49 -68

Purpose

Probabilistic topic modeling methods consist of a set of algorithms whose main purpose is to discover the hidden subject structure in a large volume of documents. The purpose of this study is to thematically model the articles of Iranian researchers in the field of endocrinology and metabolism in the citation database of Web of Science.

Methodology

The present research is of applied type and has been done by text mining and content analysis method. In this study, all required data were retrieved from the Web of Science Citation Database using the keywords registered in the medical subject heading without a time limit until November 6, 2018. Then, using a hidden allocation algorithm, the whole set of documents in MATLAB was analyzed.

Findings

Subject categories were extracted as groups of 20 words in 10 subject categories. Then, by endocrinologists, the subject categories were named based on their relationship to various topics in the field of endocrinology and metabolism, and each category was assigned a subject title.

Conclustion

The results indicate that the implementation of the latent Dirichlet allocation model has an acceptable performance in presenting the categories of endocrinology and metabolism. The extracted subject categories have good homogeneity and thematic relevance with each other.

Keywords: Endocrinology, metabolism, Topic modeling, LDA, Text mining, Iran

Abstract View Paper Research/Original Article Original: Persian
مدل سازی موضوعی و کاربرد آن در پژوهش ها: مروری بر ادبیات تخصصی

فاطمه زرمهر، علی منصوری*، حسین کارشناس

پژوهش نامه کتابداری و اطلاع رسانی، سال یازدهم شماره 1 (پیاپی 21، بهار و تابستان 1400)، صص 23 -39

مقدمه

مدل سازی موضوعی یکی از تکنیک های متن کاوی است که امکان کشف موضوعات نامعلوم در مجموعه اسناد، تفسیر اسناد بر اساس این موضوعات و استفاده از این تفاسیر برای سازماندهی، خلاصه کردن و جستجوی متن ها را به طور اتوماتیک میسر می کند. آشنایی با مفهوم و تکنیک مدل سازی موضوعی، و کاربرد آن در کشف موضوعات و سازمان دهی منابع اطلاعاتی از اهداف اصلی این پژوهش است.

روش شناسی

پژوهش حاضر از نوع کتابخانه ای است که در آن، ضمن معرفی مدل سازی موضوعی، به دسته بندی و مرور کاربردهای این تکنیک بر اساس ماهیت عملکردی آن و ارایه نمونه تحقیقاتی که از این تکنیک استفاده نموده اند پرداخته است.

یافته ها

الگوریتم های مدل سازی موضوعی علاوه بر سه هدف اصلی مبنی بر کشف موضوعات پنهان، تفسیر اسناد بر اساس موضوعات و نهایتا سازمان دهی و طبقه بندی متون، در کشف موضوعات و روابط پنهان در حوزه های علوم، بازیابی اطلاعات، دسته بندی مدارک بر اساس موضوعات، کشف الگوهای برجسته و رویدادهای در حال ظهور، خوشه بندی مفاهیم حوزه های علمی، تحلیل سیر تحول مفهومی در طول دوره های تاریخی، تعیین روابط سلسه مراتبی مفاهیم یک حوزه یا زمینه خاص علمی و غنی سازی فهرست واژگان کاربرد دارد.

نتیجه

مدل سازی موضوعی با تکیه بر یادگیری ماشین و بهره گیری از دانش هوش مصنوعی به عنوان یکی از رویکردهای نوین سازماندهی منابع اطلاعاتی مطرح شده و مطالعات جدی در این زمینه در حال انجام است. لذا با کاربرد الگوریتم های مدل سازی موضوعی در راستای خودکارسازی استخراج موضوع و کشف موضوعات نهان موجود در منبع می توان بر تقویت و روزآمدسازی نظام های نوین سازمان دهی منابع اطلاعاتی عمل کرد.

کلید واژگان: متن کاوی، مدل سازی موضوعی، کشف موضوع، سازماندهی اطلاعات، تشخیص موضوع

چکیده مشاهده متن مقاله مروری زبان: فارسی

Topic Modeling and its Application in Research: A Review of Specialized Literature

Fatemeh Zarmehr, Ali Mansouri *, Hosein Karshenas

Library and Information Science Research, Volume:11 Issue: 1, 2021, PP 23 -39

Introduction

Topic modeling is one of the text mining techniques that allows you to discover unknown topics in a collection of documents, interpret documents based on these topics, and use these interpretations to organize, summarize, and search for texts automatically. Familiarity with the concept and technique of topic modeling, and its application in discovering topics and organizing information is one of the main goals of this research.

Methodology

The present study is a review-analytical type in which, while introducing topic modeling, it has categorized and reviewed the applications of this technique based on its performance and provided a sample of research that has used this technique.

Findings

Topic modeling algorithms is used not only in addition to the three main objectives of discovering hidden topics, interpreting documents based on topics, and finally organizing and classifying texts, but also is used in discovering hidden topics and relationships in the fields of science, information retrieval, categorizing documents based on topics, discovering outstanding patterns and emerging events, clustering the concepts of scientific fields, analyzing the course of conceptual evolution during historical periods, determining the hierarchical relationships of concepts. A specific scientific field or field and vocabulary enrichment.

Conclusion

Topic modeling based on machine learning and artificial intelligence knowledge has been proposed as one of the new approaches to organizing information resources and serious studies are being conducted in this field. Therefore, by using topic modeling algorithms in order to automate the extraction of the subject and discover the hidden issues in the source, it is possible to strengthen and update the new systems of organizing information resources.

Keywords: Text mining, Topic Modeling, Subject Discovery, Information Organization, Subject Diagnosis, Subject Allocation

Abstract View Paper Review Article Original: Persian
بررسی توسعه و روند موضوعی حوزه علم اطلاعات و دانش شناسی بر اساس مدل موضوعی LDA

مریم باغ محمد، علی منصوری*، مهرداد چشمه سهرابی

پژوهشنامه پردازش و مدیریت اطلاعات، سال سی و ششم شماره 2 (پیاپی 104، زمستان 1399)، صص 297 -328

هدف پژوهش حاضر، شناسایی روند موضوعی مقالات ایرانیان در حوزه علم اطلاعات دانش شناسی با استفاده از الگوریتم های مدل سازی موضوعی LDA و مدل رگرسیون خطی است. جامعه پژوهش شامل 709 مقاله دارای چکیده و نمایه شده در پایگاه اسکوپوس در بازه زمانی 2008- 2019 است. به منظور دستیابی به اهداف پژوهش، داده ها با استفاده از الگوریتم های متن کاوی و به طور خاص الگوریتم های مدل سازی موضوعی LDA با استفاده از نرم افزار R مورد تحلیل قرار گرفت. نتایج حاصل از بررسی داده ها نشان داد که موضوع های داغ که از میزان محبوبیت پژوهشی بیشتری برخوردارند، شامل خدمات کتابخانه ای در شبکه های اجتماعی، مدل های پژوهش، سرمایه اجتماعی، پایگاه های اطلاعاتی پزشکی، داده کاوی، روند تولید علمی، موضوعات بین رشته ای، الگوریتم های فضای مجازی، مدیریت دانش، مطالعات شبکه های اجتماعی، رویکردهای پژوهشی و آینده پژوهی و موضوعات سرد که از میزان محبوبیت پژوهشی کمتری برخوردارند به حوزه های منابع الکترونیکی، سیستم مدیریت اطلاعات، موتورهای جستجو، خدمات امانت، خدمات از راه دور، یادگیری الکترونیکی، دولت الکترونیک، شاخص های ارزیابی مجلات، ارزیابی منابع وبی و کتابخانه های دیجیتال است. نتایج نشان داد که پژوهش های موضوعی رشته علم اطلاعات و دانش شناسی در ایران، همگام با رشد فناوری ها و موضوعات جهانی توسعه یافته و ارتباط حوزه موضوعی علم اطلاعات و دانش شناسی را با زمینه های نوین داده کاوی، هوش مصنوعی، بازیابی معنایی، هستی شناسی، معماری اطلاعات، نشر دیجیتال، شبکه های اجتماعی و پایگاه های اطلاعاتی برقرار نموده اند.

کلید واژگان: مدل سازی موضوعی، الگوریتم LDA، تحلیل روند، موضوعات داغ و سرد، موضوعات پر طرفدار و کم طرفدار، علم اطلاعات و دانش شناسی

چکیده مشاهده متن مقاله پژوهشی/اصیل زبان: فارسی

Identification of Topic Development Process of Knowledge and Information Science Field Based on the Topic modeling (LDA)

Maryam Baghmohammad, Ali Mansouri*, Mehrdad Cheashmehsohrabi

Journal of Information Processing and Management, Volume:36 Issue: 2, 2021, PP 297 -328

The purpose of this study is to explore the subject structure and hidden knowledge and development of new research subjects. This research is applied from the point of purpose and type, descriptive and from the point of method, trend analysis. The population of this study is the all of articles indexed in Scopus and written by Iranian authors in the field of knowledge and information science from 1992-2019, Then take the targeted sample in 2008-2019.Data analysis tool is software R which abstract the papers were analyzed after applying pre - processing steps with Topic modeling and LDA algorithm. The result of study to identification topic development process showed Hot topics & Cold topics to identifying the process of knowledge and information science by determining the p-level (0/05) and identify alpha, beta and theta parameters. The Hot topics that have a more popularity is: Library services on Social Networks, Research Models, Social Capital, Data Base, Data Mining, Trend of Scientific Publication, Interdisciplinary Fields, Cyberspace Algorithms, Knowledge Management and Cold topic that have a less popularity is: Electronic Resources, Information management System, Search Engine, Book loan Services, Distance Library Services, E learning, E Government, Journal Evaluation Indicators, Web sources Evaluation, Digital Library. The result of this research showed that the field of knowledge and information science can growth up with globalization and new global issues and this field ability to develop and expand in data mining, artificial intelligence, semantic retrieval, information architecture, digital publishing, social networking and information technology.

Keywords: Topic Modeling, LDA, Trend Analysis, Hot Topic, Cold Topic, log-likelihood, knowledge, Information Science, Text Mining

Abstract View Paper Research/Original Article Original: Persian
بررسی رویکردهای متن کاوی و عملکرد آن در کشف و استخراج موضوع

علی منصوری*، فاطمه زرمهر، حسین کارشناس

مجله تعامل انسان و اطلاعات، سال هفتم شماره 1 (بهار 1399)، صص 15 -26

زمینه و هدف

در این پژوهش چهار روش متن کاوی بررسی می شود و بر درک و شناسایی خصوصیات و محدودیت های آن ها در کشف موضوع تمرکز می کند. این چهار روش عبارت اند از 1) تجزیه وتحلیل معنایی پنهان(LSA) 2) تحلیل معنایی پنهان احتمالاتی(PLSA)، 3) تخصیص دیریکله پنهان(LDA) و 4) مدل سازی موضوعی همبسته(CTM).

روش پژوهش

پژوهش حاضر از نوع کتابخانه ای است که در آن، ادبیات حوزه متن کاوی و مدل سازی موضوعی مرور و تحلیل شده است.

یافته ها

تجزیه وتحلیل معنایی پنهان می تواند برای تشخیص موضوعات خاص و منحصربه فرد در مدارکی که تنها به یک موضوع پرداخته اند استفاده شود. سه روش دیگر متن کاوی، بر موضوعات و گرایش کلی متن متمرکز هستند. تحلیل معنایی پنهان احتمالاتی برای مدارکی که به یک موضوع پرداخته اند قابل استفاده است اما برخلاف تجزیه وتحلیل معنایی پنهان ، این روش در کشف موضوعات و مضامین کلی متن کاربرد دارد. درحالی که تخصیص دیریکله پنهان در مورد مدارکی که به چندین موضوع پرداخته اند کاربرد بیشتری دارد. روش مدل سازی موضوعی همبسته می تواند در تشخیص ارتباط بین دسته های موضوعی مختلف استفاده شود.

نتیجه گیری

رویکردهای متن کاوی به خاطر بهره گیری از تحلیل معنایی در کشف و استخراج موضوع متون مناسب است

کلید واژگان: متن کاوی، مدل سازی موضوعی، تحلیل معنایی، کشف موضوع

چکیده مشاهده متن مقاله پژوهشی/اصیل زبان: فارسی

A review of text mining approaches and their function in discovering and extracting a topic

Ali Mansouri*, Fatemeh Zarmehr, Hossein Karshenas

Human Information Interaction, Volume:7 Issue: 1, 2020, PP 15 -26

Background and aim

Four text mining methods are examined and focused on understanding and identifying their properties and limitations in subject discovery.

Methodology

The study is an analytical review of the literature of text mining and topic modeling.

Findings

LSA could be used to classify specific and unique topics in documents that address only a single topic. The other three text mining methods focus on topics and general partiality of the text. PLSA is applicable to documents dealing with a topic, unlike the LSA, it is used to discover general themes and contexts. However, LDA is more applicable to documents that address several issues. The CTM, method can be used to identify relationship between different subject categories.

Conclusion

Text mining tactics are suitable for employing analysis in discovering and extracting the text subjects.

Keywords: Text mining, Topic Modeling, Semantic Analysis, Topic Discovery

Abstract View Paper Research/Original Article Original: Persian
جستاری بر فرایند سازماندهی و بازیابی متون وبی مبتنی بر تجمیع مفاهیم معنایی در راستای سازماندهی دانش

حمید طباطبایی*، مجتبی کفاشان، سعیده انبایی

پژوهشنامه پردازش و مدیریت اطلاعات، سال سی و چهارم شماره 4 (پیاپی 98، تابستان 1398)، صص 1879 -1904

سازماندهی و بازیابی دانش منتشر شده در محیط وب بعنوان یکی از مهمترین کاربردهای متن کاوی مطرح شده است. از جمله چالش های سازماندهی مجموعه عظیمی از متون در قالب یک پیکره متنی، ابعاد زیاد ویژگی ها و خلوت بودن ماتریس ویژگی ها است. نحوه ی انتخاب ویژگی ها و نحوه ی کاهش ویژگی ها در این مسئله تاثیر بسزایی در بالاتر رفتن دقت سازماندهی و بازیابی متون دارد. در بسیاری از پژوهش ها به بررسی منفک این دو چالش پرداخته شده است. این پژوهش با رویکرد توجه همزمان به این دو چالش شرح یافته است. پس از تعیین متون مرتبط با 20 گروه خبری وبی و پس از فاز پیش پردازش متون با استفاده از الگوریتم الگو سازی عنوان[1] ال دی ای[2]، کیسه ای (تجمیعی) از مفاهیم معنایی برای پیکره ی متنی مورد نظر ساخته شد. به منظور بررسی میزان تاثیر واژه های پیکره متون در هر مفهوم پنهان، به بررسی نحوه ی وزن دهی واژگان یک پیکره، در مفاهیم استخراج شده توسط الگوریتم ال دی ای پرداخته شد. از این رو، برای هر متن یک توزیع احتمال رخداد حول هر عنوان استخراج گردید که برای سازماندهی و بازیابی دانش موجود در آن مورد استفاده قرار گرفت. برای سازماندهی آن از الگوریتم نزدیکترین K همسایه با معیار شباهت واگرای کولبک لیبلر که میزان فاصله دو توزیع احتمال را می سنجد؛ استفاده شد. نتایج آزمون ها نشان داد که میزان صحت سازماندهی روش پیشنهادی در صورتی که از معیار وزن دهی واکشی اطلاعات متقابل نقطه ای و الگوریتم KL-KNN استفاده شده باشد 5/82% است. نتایج تحلیل ها نشان داد که این روش دارای دقت مشابهی با روش هایی است که از فنون یادگیری عمیق استفاده می نمایند. افزون بر این، روش بکارگرفته در این پژوهش نشان دهنده پیچیدگی کمتری در فرایند سازماندهی و بازیابی متون مورد مطالعه پژوهش بود.

4. Topic modeling

5. Latent Dirichlet Allocation

کلید واژگان: متن کاوی، طبقه بندی متن، الگوسازی عنوان، بازیابی، سازماندهی دانش، هستی شناسی

چکیده مشاهده متن مقاله پژوهشی/اصیل زبان: فارسی

An Investigation into the Process of Organizing and Retrieving Web Texts Based on the Integration of Semantic Concepts In order to organize knowledge

Saeede Anbaee, Hamid Tabatabaee*, Mojtaba Kaffashan

Journal of Information Processing and Management, Volume:34 Issue: 4, 2019, PP 1879 -1904

Improvement in information retrieval performance relates to the method of knowledge extraction from large amounts of text information on web. Text classification is one of application of knowledge extraction with supervised machine learning methods. This paper proposed Kullback-Leibler divergence KNN for classifying extracted features based on term weighting with Latent Dirichlet Allocation Algorithm. LDA is Non Negative matrix factorization method proposed for topic modelling and dimension reduction of high dimensional feature space .In traditional LDA, each component value is assigned using the information retrieval TF measure, While this weighting method seems very appropriate for IR, it is not clear that it is the best choice for TC problems. Actually, this weighting method does not leverage the information implicitly contained in the categorization task to represent documents. In this paper, we introduce a new weighting method based on Point wise Mutual Information for accessing the importance of a word for a specific latent concept, then each document classified based on probability distribution over the latent topics. Experimental result investigated when we used PMI measure for term Weighing and KNN with Kullback-Leibler distance, accuracy has been 82.5%, with lower complexity and same accuracy versus complex deep learning methods.

Keywords: text mining, text classification, topic modeling, latent dirichlet allocation, document representation, Knowledge organization, Pointwise mutual Information

Abstract View Paper Research/Original Article Original: Persian

نمایش نتایج بیشتر...

نکته

نتایج بر اساس تاریخ انتشار مرتب شده‌اند.
کلیدواژه مورد نظر شما تنها در فیلد کلیدواژگان مقالات جستجو شده‌است. به منظور حذف نتایج غیر مرتبط، جستجو تنها در مقالات مجلاتی انجام شده که با مجله ماخذ هم موضوع هستند.
در صورتی که می‌خواهید جستجو را در همه موضوعات و با شرایط دیگر تکرار کنید به صفحه جستجوی پیشرفته مجلات مراجعه کنید.

به جمع مشترکان مگیران بپیوندید!

topic modeling