A new Persian Text Summarization Approach based on Natural Language Processing and Graph Similarity

Author(s):

Tayyebeh Hosseinikhah * , Abbas Ahmadi , Azadeh Mohebi

Message:

Article Type:

Research/Original Article (دارای رتبه معتبر)

Abstract:

A significant amount of available information is stored in textual databases which contains a large collection of documents from different sources (such as news, articles, books, emails and web pages). The increasing visibility and importance of this class of information motivates us to work on having better automatic evaluation tools for textual resources.
The automatic summarization of text is one of the ways to prevent the waste of users time. The extractive text summarization consists of the extraction of the more important sentences with the purpose of shortening input text while maintaining the topics covered and the subjects discussed.
In this paper, we have tried to improve the accuracy of the extracted summaries by combining natural language processing and text mining techniques. By modifying the mentioned algorithms and sentence scoring measures, accuracy is increased as compared to the previously used techniques.
Part of speech tagging is used for calculating coefficient of words importance. Using this approach will in turn help us with to pick the more meaningful words and phrases that will result in better accuracy of the system.
Graph similaritys methods are used to select sentences. Changing weight of the selected sentences in each step leads to solve the redundancy problem.
Standard evaluation measures such as Precision and Recall are used to evaluate results based on a Persian corpus.

Keywords:

Extractive Summarization , Natural Language Processing , Text Mining , Part of Speech Tagging , Similarity Graph

Language:

Persian

Published:

Journal of Information Processing and Management, Volume:33 Issue: 2, 2018

Pages:

885 to 944

magiran.com/p1805566

دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:

اشتراک شخصی

با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!

اشتراک سازمانی

به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!

اطلاعات بیشتر

توجه!

حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.

In order to view content subscription is required

Personal subscription

Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.

Organization subscription

Please contact us to subscribe your university or library for unlimited access!

More information

علمی مصوب

پژوهشنامه پردازش و مدیریت اطلاعات

Journal of Information Processing and Management

فصلنامه علوم انسانی

آخرین شماره | آرشیو

ISSN: 2251-8223 eISSN: 2251-8231

تا پاییز 1384 با نام «علوم اطلاع رسانی» منتشر شده است.

صاحب امتیاز:

پژوهشگاه علوم و فناوری اطلاعات ایران

مدیر مسئول:

دکتر محمد حسن زاده

سردبیر:

دکتر سید رحمت الله فتاحی

تلفن نشریه: ۰۲۱-۶۶۴۹۴۹۸۰

اطلاعات بیشتر نشریه

درباره نشریه پیام به نشریه سایت اختصاصی نشریه پذیرش الکترونیکی مقاله

به جمع مشترکان مگیران بپیوندید!

A new Persian Text Summarization Approach based on Natural Language Processing and Graph Similarity

Tayyebeh Hosseinikhah * , Abbas Ahmadi , Azadeh Mohebi

Extractive Summarization , Natural Language Processing , Text Mining , Part of Speech Tagging , Similarity Graph

پژوهشنامه پردازش و مدیریت اطلاعات

Journal of Information Processing and Management