Improve Text Representation Using RST-based Deep Neural Networks
Finding a highly informative, low-dimensional representation for texts, specifically long texts, is one of the main challenges for natural language processing (NLP) tasks. For texts longer than sentences or a paragraph, finding a good representation beyond the bag-of-words model without losing word order is still a challenge. This representation should capture the semantic and syntactic information of the text while retaining relevance for large-scale similarity search and accurate text classification. We propose the utilization of Rhetorical Structure Theory (RST) to consider the text structure in the representation. RST creates a tree-structure format for the text document and model the importance and relationship between sentences or phrases. In this paper, we examine the effect of using this structure on two different NLP tasks. In information retrieval, to embed document relevance in distributed representation, we use a Siamese neural network to jointly learn document representations. Our Siamese network consists of two sub-networks of recursive neural networks (RNN) built over the RST tree. For this task, we use a subset of Reuters’s news corpus and BBC news dataset. The results show that our approach outperforms conventional text representations like tf_idf, LDA, LSA and word vector averaging. The proposed representation beats the best conventional method by %6 and %3 in precision at k retrieved documents on BBC and Reuters datasets, respectively. In the sentiment analysis task, first, we use an rst-based recursive neural network to represent movie reviews and classify the polarity of people’s opinions. Then we propose to use the nucleus-satellite information of a node in the rst-tree to build an attention mechanism by deep RNN to generate better discourse representations. We test the effectiveness of our approach on sentiment analysis task, and we prove that considering the importance of the text span improves sentiment analysis performance by %3 on the internet movie review database. In this paper, we improve the text representation by the rst-based deep neural network. We can evaluate this approach on the other languages to show the effectiveness of using the structure format of the text.
- حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران میشود.
- پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانههای چاپی و دیجیتال را به کاربر نمیدهد.