A Customized Web Spider for Why-QA Pairs Corpus Preparation

Author(s):

Manvi Breja

Message:

Article Type:

Research/Original Article (دارای رتبه معتبر)

Abstract:

Considering the growth of researches on improving the performance of non-factoid question answering system, there is a need of an open-domain non-factoid dataset. There are some datasets available for non-factoid and even how-type questions but no appropriate dataset available which comprises only open-domain why-type questions that can cover all range of questions format. Why-questions play a significant role and are usually asked in every domain. They are more complex and difficult to get automatically answered by the system as why-questions seek reasoning for the task involved. They are prevalent and asked in curiosity by real users and thus their answering depends on the users’ need, knowledge, context and their experience. The paper develops a customized web crawler for gathering a set of why-questions from five popular question answering websites viz. Answers.com, Yahoo! Answers, Suzan Verberne’s open-source dataset, Quora and Ask.com available on Web irrespective of any domain. Along with the questions, their category, document title and appropriate answer candidates are also maintained in the dataset. With this, distribution of why-questions according to their type and category are illustrated. To the best of our knowledge, it is the first large enough dataset of 2000 open-domain why-questions with their relevant answers that will further help in stimulating researches focusing to improve the performance of non-factoid type why-QAS.

Keywords:

Non-Factoid questions , web crawler , Latent Dirichlet Allocations , Topic Modeling , Natural Language Processing

Language:

English

Published:

Journal of Information Systems and Telecommunication, Volume:11 Issue: 1, Jan-Mar 2023

Pages:

41 to 47

https://www.magiran.com/p2552748

دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:

اشتراک شخصی

با ثبت ایمیلتان و پرداخت حق اشتراک سالانه به مبلغ 1,950,000 ريال، بلافاصله متن این مقاله را دریافت کنید.اعتبار دانلود 70 مقاله نیز در حساب کاربری شما لحاظ خواهد شد.

پرداخت حق اشتراک به معنای پذیرش "شرایط خدمات" پایگاه مگیران از سوی شماست.

پست الکترونیکی

اگر مقاله ای از شما در مگیران نمایه شده، برای استفاده از اعتبار اهدایی سامانه نویسندگان با ایمیل منتشرشده ثبت نام کنید. ثبت نام

اشتراک سازمانی

به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!

اطلاعات بیشتر ثبت نام با ایمیل دانشگاهی/سازمانی

توجه!

حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.

In order to view content subscription is required

Personal subscription

Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.

Organization subscription

Please contact us to subscribe your university or library for unlimited access!

More information

علمی مصوب

Journal of Information Systems and Telecommunication

نشریه سیستم های اطلاعاتی و مخابرات

فصلنامه فنی مهندسی به زبان انگلیسی

Information Systems and Telecommunication

آخرین شماره | آرشیو

ISSN: 2322-1437 eISSN: 2345-2773

صاحب امتیاز:

جهاد دانشگاهی

مدیر مسئول:

مهندس حبیب الله اصغری

سردبیر:

دکتر مسعود شفیعی

تلفن نشریه: ۰۲۱-۸۸۹۳۰۱۵۰

اطلاعات بیشتر نشریه

درباره نشریه پیام به نشریه سایت اختصاصی نشریه پذیرش الکترونیکی مقاله راهنمای نویسندگان

به جمع مشترکان مگیران بپیوندید!

A Customized Web Spider for Why-QA Pairs Corpus Preparation

Manvi Breja

Non-Factoid questions , web crawler , Latent Dirichlet Allocations , Topic Modeling , Natural Language Processing

Journal of Information Systems and Telecommunication

نشریه سیستم های اطلاعاتی و مخابرات