topic modeling
در نشریات گروه فنی و مهندسی-
Panic buying, characterized by consumers purchasing unusually large quantities of products in response to disasters, perceived threats, or anticipated price raises or shortages, remains a multifaceted phenomenon requiring further investigation. The COVID-19 crisis has provided a unique opportunity to conduct thorough analyses of panic buying behavior in a real-world context. Furthermore, the pandemic has underscored the importance of understanding panic buying dynamics, given its significant impact on consumer behavior and supply chain resilience. While many studies have concentrated on the psychological aspects of this phenomenon, there exists a gap in exploring its impact on products and goods. Therefore, there is a critical need to examine its effects across various product categories. In this study, we employed innovative topic modeling techniques to examine panic buying behavior and its implications during the COVID-19 crisis. Leveraging data from the X platform, our study adopts a novel approach integrating Sentence-BERT and BERTopic methodologies to identify key topics across diverse product categories. By providing insights into the outcomes of panic buying, this study contributes to a more comprehensive understanding of consumer behavior during crisis. Moreover, our findings hold considerable significance for policymakers and supply chain managers, offering insights to develop targeted interventions aimed at mitigating the impact of panic buying on supply chains and ensuring efficient resource allocation during future crises.
Keywords: Consumer Behavior, Panic Buying, Topic Modeling, Covid-19, Pandemic -
Sentiment analysis is a process through which the beliefs, sentiments, allusions, behaviors, and tendencies in a written language are analyzed using Natural Language Processing (NLP) techniques. This process essentially comprises of discovering and understanding people's positive or negative sentiments regarding a product or entity in the text. The increased significance of sentiments analysis has coincided with the growth in social media such as surveys, blogs, Twitter, etc. The present study takes advantage of the topic modeling approach based on latent Dirichlet allocation (LDA) to extract and represent the thematic features as well as a support vector machine (SVM) to classify and analyze sentiments at the aspect level. LDA seeks to extract latent topics by observing all the texts, which is accomplished by assigning the probability of each word being attributed to each topic. The important features that represent the thematic aspect of the text are extracted and fed to a support vector machine for classification through this approach. SVM is an extremely powerful classification algorithm that provides the possibility to separate complex data from one another accurately by mapping the data to a space with much larger aspects and creating an optimal hyperplane. Empirical data on real datasets indicate that the proposed model is promising and performs better compared to the baseline methods in terms of precision (with 89.78% on average), recall (with 78.92% on average), and F-measure (with 83.50% on average).
Keywords: Natural Language Processing, Sentiment Analysis, Aspect-Level, Topic Modeling, LDA -
گیاهخواری از جریان هایی است که بازخوردهای زیادی در شبکه های اجتماعی داشته است. مطالب منتشر شده توسط کاربران نشان دهنده ی احساسات و نظرات آنان نسبت به این جریان و جنبه های مختلف آن می باشد. در همین راستا، مجموعه داده ای شامل بیش از شصت هزار توییت منتشر شده در سال 2023 در مورد گیاهخواری جمع آوری شده است. این مجموعه برای استخراج احساسات کاربران نسبت به جنبه های مختلف گیاهخواری استفاده شده است. ابتدا روشی مبتنی بر مدل زبانی RoBERTa برای تحلیل احساسات ضمنی نهفته در توییت ها ارایه می شود. سپس با استفاده از مدل سازی موضوعی LDA ، تعدادی جنبه و موضوع مرتبط با گیاهخواری استخراج می شود. در مرحله بعد با استفاده از روشی مبتنی بر مدل زبانی DeBERTa به تحلیل احساسات توییت ها نسبت به جنبه های مختلف استخراج شده، پرداخته می شود. نمودارهای مختلف فراوانی و توزیع احساسات برای جنبه های مختلف در حیطه ی گیاهخواری مورد بررسی قرار می گیرد. با نمودارهایی نتایج حاصل از تحلیل احساسات مبتنی بر RoBERTa با نتایج حاصل از DeBERTa در کنار هم، مورد بحث و بررسی قرار می گیرد. تجزیه و تحلیل داده ها با استفاده از مدل مبتنی بر DeBERTa نشان می دهد که کاربران در مورد جنبه های plant و lifestyle توییت هایی اکثرا با جهتگیری مثبت منتشر کرده اند. در مورد جنبه Animal غالبا با احساسی منفی مطالبی منتشر کرده اند. برای هر یک از جنبه های Diet و Co با مقادیری نزدیک به هم، اکثر توییت ها مثبت و یا خنثی هستند. در میان بحث، تعدادی دانش ضمنی در رابطه با این موضوع مورد بررسی قرار می گیرد.
کلید واژگان: تحلیل احساسات مبتنی بر جنبه، مدل سازی موضوعی، پردازش زبان طبیعی، پردازش متنJournal of New Achievements in Electrical, Computer and Technology, Volume:3 Issue: 7, 2023, PP 36 -53Vegetarianism is one of the trends that has received a lot of feedback on social networks. The content published by users reflects their feelings and opinions towards this trend and its various aspects. In this regard, a dataset containing more than sixty thousand tweets published in 2023 about vegetarianism was collected. This dataset was used to extract user sentiment towards different aspects of vegetarianism. First, a method based on RoBERTa language model was proposed to analyze the implicit sentiment hidden in tweets. Then, using the Latent Dirichlet Allocation topic modeling approach, some relevant aspects and topics related to vegetarianism were extracted. In the next step, a method based on DeBERTa language model was used to analyze tweet sentiment towards different aspects that had been extracted. Various frequency and sentiment distribution charts for different aspects in the field of vegetarianism were examined. The results of emotional analysis based on RoBERTa and DeBERTa models were compared side by side. Data analysis using the DeBERTa model showed that users had mostly tweeted positive sentiments regarding the plant and lifestyle aspects. However, for the Animal aspect, most tweets were negative. For both Diet and Company aspects, most tweets were positive or neutral with values close to each other. During the discussion, some implicit knowledge related to this topic was also examined.
Keywords: Aspect Based Sentiment Analysis, Topic Modeling, Natural Language Processing, Text Processing -
یکی از کاربردهای مهم در پردازش زبان طبیعی، دسته بندی متون است. برای دسته بندی متون خبری باید ابتدا آنها را به شیوه مناسبی بازنمایی کرد. روش های مختلفی برای بازنمایی متن وجود دارد ولی بیشتر آنها روش هایی همه منظوره هستند و فقط از اطلاعات هم رخدادی محلی و مرتبه اول کلمات برای بازنمایی استفاده می نمایند. در این مقاله روشی بی ناظر برای بازنمایی متون خبری ارایه شده است که از اطلاعات هم رخدادی سراسری و اطلاعات موضوعی برای بازنمایی اسناد استفاده می نماید. اطلاعات موضوعی علاوه بر اینکه بازنمایی انتزاعی تری از متن ارایه می دهد حاوی اطلاعات هم رخدادی های مراتب بالاتر نیز هست. اطلاعات هم رخدادی سراسری و موضوعی مکمل یکدیگرند. بنابراین در این مقاله به منظور تولید بازنمایی غنی تری برای دسته بندی متن، هر دو بکارگرفته شده اند. روش پیشنهادی بر روی پیکره های R8 و 20-Newsgruops که از پیکره های شناخته شده برای دسته بندی متون هستند آزمایش شده و با روش های مختلفی مقایسه گردید. در مقایسه با روش پیشنهادی با سایر روش ها افزایش دقتی به میزان افزایش 3% مشاهده گردید.
کلید واژگان: بازنمایی سند، تعبیه سند، تعبیه کلمه، همرخدادی کلمات، اطلاعات موضوعی، دسته بندی متنText classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way that can be distinguishable by a classifier. There is an abundance of methods in the literature for document representation which can be divided into a bag of words model, graph-based methods, word embedding pooling, neural network-based, and topic modeling based methods. Most of these methods only use local word co-occurrences to generate document embeddings. Local word co-occurrences miss the overall view of a document and topical information which can be very useful for classifying news articles. In this paper, we propose a method that utilizes term-document and document-topic matrix to generate richer representations for documents. Term-document matrix represents a document in a specific way where each word plays a role in representing a document. The generalization power of this type of representation for text classification and information retrieval is not very well. This matrix is created based on global co-occurrences (in document-level). These types of co-occurrences are more suitable for text classification than local co-occurrences. Document-topic matrix represents a document in an abstract way and the higher level co-occurrences are used to generate this matrix. So this type of representation has a good generalization power for text classification but it is so high-level and misses the rare words as features which can be very useful for text classification. The proposed approach is an unsupervised document-embedding model that utilizes the benefit of both document-topic and term-document matrices to generate a richer representation for documents. This method constructs a tensor with the help of these two matrices and applied tensor factorization to reveal the hidden aspects of data. The proposed method is evaluated on the task of text classification on 20-Newsgroups and R8 datasets which are benchmark datasets in the news classification area. The results show the superiority of the proposed model with respect to baseline methods. The accuracy of text classification is improved by 3%.
Keywords: Text classification, Document representation, Document Embedding, Topic modeling, word co-occurrences -
Considering the growth of researches on improving the performance of non-factoid question answering system, there is a need of an open-domain non-factoid dataset. There are some datasets available for non-factoid and even how-type questions but no appropriate dataset available which comprises only open-domain why-type questions that can cover all range of questions format. Why-questions play a significant role and are usually asked in every domain. They are more complex and difficult to get automatically answered by the system as why-questions seek reasoning for the task involved. They are prevalent and asked in curiosity by real users and thus their answering depends on the users’ need, knowledge, context and their experience. The paper develops a customized web crawler for gathering a set of why-questions from five popular question answering websites viz. Answers.com, Yahoo! Answers, Suzan Verberne’s open-source dataset, Quora and Ask.com available on Web irrespective of any domain. Along with the questions, their category, document title and appropriate answer candidates are also maintained in the dataset. With this, distribution of why-questions according to their type and category are illustrated. To the best of our knowledge, it is the first large enough dataset of 2000 open-domain why-questions with their relevant answers that will further help in stimulating researches focusing to improve the performance of non-factoid type why-QAS.
Keywords: Non-Factoid questions, web crawler, Latent Dirichlet Allocations, Topic Modeling, Natural Language Processing -
In recent years, we have seen an increase in the production of films in a variety of categories and genres. Many of these products contain concepts that are inappropriate for children and adolescents. Hence, parents are concerned that their children may be exposed to these products. As a result, a smart recommendation system that provides appropriate movies based on the user's age range could be a useful tool for parents. Existing movie recommender systems use quantitative factors and metadata that lead to less attention being paid to the content of the movies. This research is motivated by the need to extract movie features using information retrieval methods in order to provide effective suggestions. The goal of this study is to propose a movie recommender system based on topic modeling and text-based age ratings. The proposed method uses latent Dirichlet allocation (LDA) modelling to identify hidden associations between words, document topics, and the levels of expression of each topic in each document. Machine learning models are then used to recommend age-appropriate movies. It has been demonstrated that the proposed method can determine the user's age and recommend movies based on the user's age with 93% accuracy, which is highly satisfactory.Keywords: Recommendation Systems, Text Classification, topic modeling
-
Scientia Iranica, Volume:28 Issue: 3, May-Jun 2021, PP 1830 -1852
Information Technology (IT), Management and Industrial Engineering are correlated academic disciplines which their publications rose significantly over the last decades. The aim of this study is analyzing the research evolution, determining the important topics and areas and depiction the trend of interdisciplinary topics in these domains. To accomplish this, the text mining techniques are used and the combination of bibliographic analysis and topic modeling approach are applied on their publications in the WOS repository over the last 20 years. In the topic extraction process, a heuristic function was suggested to key extraction, and some new applicable criteria were defined to compare the topics. Moreover, a novel approach was proposed to determine the high-level category for each topic. The results determined the hot-important topics and incremented, decremented and fixed topics are identified. Subsequently, comparing the high-level research areas confirmed the strong scientific relationships between them. This study presents a deep knowledge about internal research evolution of domains and illustrates the effect of topics on each other over the past 20 years. Furthermore, the methodology of this study could be applied to determine the interdisciplinary topics and observe the research evolution of other academic domains.
Keywords: Research Evolution, Topic Modeling, Trend analysis, Information Technology (IT), industrial engineering, Management -
استخراج دانش از میان داده های موجود در وب باتوجه به حجم و تنوع بالای آن به یک چالش در حوزه ی بازیابی اطلاعات تبدیل شده است. در این میان، مساله ی بازیابی و رتبه بندی افراد خبره با هدف بازیابی و رتبه بندی افراد خبره در زمینه ی موضوع پرس وجوی کاربر، به عنوان یکی از مهم ترین مسائل موجود در این حوزه توجه بسیاری از پژوهشگران را به خود جلب نموده است. مهم ترین چالش در مسئله ی بازیابی افراد خبره تشخیص میزان ارتباط بین کلمات پرس وجو و سند های نوشته شده توسط نامزد های خبرگی است. یک مشکل اساسی در این حوزه فاصله ی واژگانی میان کلمات پرس وجو و سند های نامزد های خبرگی است. در این مقاله دو مدل ترجمه ی جدید برای مدل سازی فاصله ی واژگانی ارائه شده است. مدل اول یک مدل احتمالاتی مبتنی بر خوشه بندی و مدل دوم مبتنی بر مدل سازی موضوعی است. در هر دو مدل، کلمات پرس وجو به مجموعه ای از کلمات مرتبط با پرس وجو که بیشتر نشان دهنده ی یک زمینه ی خبرگی هستند ترجمه شده است. پس از ترجمه ی کلمات، از یک مدل ترکیب کننده به منظور بازیابی استفاده شده است. مدل های ارائه شده بر روی مجموعه ی آزمون Stack Overflow ارزیابی و تحلیل شده است. نتایج به دست آمده بیانگر افزایش میانگین متوسط دقت روش ارائه شده در مقایسه با سایر روش های بازیابی افراد خبره است.
کلید واژگان: بازیابی افراد خبره، مدل ترجمه، خوشه بندی، مدل سازی موضوعی، فاصله ی واژگانی، سیستم های پاسخ به پرسشWith respect to the increasing volume and variety of information available on the Web, it is very difficult to find the required knowledge through the massive amount of data. Question-answering systems have been created to make easy knowledge accessing through massive amounts of data. The most important factor in the issue of expert finding is the ability to detect the relationship between query words and documents written by the candidate experts. A challenging issue in this area is the vocabulary gap between query words and the documents of the candidate experts. In this paper, two new translation models are proposed to solve the problem of the vocabulary gap. First model, a cluster-based probabilistic model, and another is based on topic modeling. In these models, the query words are translated into a collection of query-related words, which are written in documents written by more candidate experts. Then, using these words and using a simple composite model, we have retrieved the experts. The proposed models are implemented and evaluated on the Stack overflow test set and finally, we have analyzed the outputs. The results indicate an increase in the Mean Average Precision of the proposed method compared with other methods of expert finding.
Keywords: Expertise retrieval, translation model, question answering systems, topic modeling, vocabulary gap -
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use semantic models for document vector representations. Latent Dirichlet allocation (LDA) topic modeling and doc2vec neural document embedding are two well-known techniques for this purpose. In this paper, we first study the conceptual difference between the two models and show that they have different behavior and capture semantic features of texts from different perspectives. We then proposed a hybrid approach for document vector representation to benefit from the advantages of both models. The experimental results on 20newsgroup show the superiority of the proposed model compared to each of the baselines on both text clustering and classification tasks. We achieved 2.6% improvement in F-measure for text clustering and 2.1% improvement in F-measure in text classification compared to the best baseline model.Keywords: Text mining, Semantic representation, Topic modeling, Neural document embedding
- نتایج بر اساس تاریخ انتشار مرتب شدهاند.
- کلیدواژه مورد نظر شما تنها در فیلد کلیدواژگان مقالات جستجو شدهاست. به منظور حذف نتایج غیر مرتبط، جستجو تنها در مقالات مجلاتی انجام شده که با مجله ماخذ هم موضوع هستند.
- در صورتی که میخواهید جستجو را در همه موضوعات و با شرایط دیگر تکرار کنید به صفحه جستجوی پیشرفته مجلات مراجعه کنید.