Density Measure in Context Clustering for Distributional Semantics of Word Sense Induction

Author(s):

Masood Ghayoomi*

Message:

Article Type:

Research/Original Article (دارای رتبه معتبر)

Abstract:

Word Sense Induction (WSI) aims at inducing word senses from data without using a prior knowledge. Utilizing no labeled data motivated researchers to use clustering techniques for this task. There exist two types of clustering algorithm: parametric or non-parametric. Although non-parametric clustering algorithms are more suitable for inducing word senses, their shortcomings make them useless. Meanwhile, parametric clustering algorithms show competitive results, but they suffer from a major problem that is requiring to set a predefined fixed number of clusters in advance.
Word Sense Induction (WSI) aims at inducing word senses from data without using a prior knowledge. Utilizing no labeled data motivated researchers to use clustering techniques for this task. There exist two types of clustering algorithm: parametric or non-parametric. Although non-parametric clustering algorithms are more suitable for inducing word senses, their shortcomings make them useless. Meanwhile, parametric clustering algorithms show competitive results, but they suffer from a major problem that is requiring to set a predefined fixed number of clusters in advance.
The main contribution of this paper is to show that utilizing the silhouette score normally used as an internal evaluation metric to measure the clusters’ density in a parametric clustering algorithm, such as K-means, in the WSI task captures words’ senses better than the state-of-the-art models. To this end, word embedding approach is utilized to represent words’ contextual information as vectors. To capture the context in the vectors, we propose two modes of experiments: either using the whole sentence, or limited number of surrounding words in the local context of the target word to build the vectors. The experimental results based on V-measure evaluation metric show that the two modes of our proposed model beat the state-of-the-art models by 4.48% and 5.39% improvement. Moreover, the average number of clusters and the maximum number of clusters in the outputs of our proposed models are relatively equal to the gold data

Keywords:

Word Sense Induction , Word Embedding , Clustering , Silhouette Score , Unsupervised Machine Learning , Distributional Semantic , Density

Language:

English

Published:

Journal of Information Systems and Telecommunication, Volume:8 Issue: 1, Jan-Mar 2020

Pages:

15 to 24

magiran.com/p2157372

دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:

اشتراک شخصی

با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!

اشتراک سازمانی

به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!

اطلاعات بیشتر

توجه!

حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.

دسترسی سراسری کاربران دانشگاه پیام نور!

اعضای هیئت علمی و دانشجویان دانشگاه پیام نور در سراسر کشور، در صورت ثبت نام با ایمیل دانشگاهی، تا پایان فروردین ماه 1403 به مقالات سایت دسترسی خواهند داشت!

In order to view content subscription is required

Personal subscription

Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.

Organization subscription

Please contact us to subscribe your university or library for unlimited access!

More information

علمی مصوب

Journal of Information Systems and Telecommunication

فصلنامه سیستم های اطلاعاتی و مخابرات

فصلنامه فنی مهندسی به زبان انگلیسی

Information Systems and Telecommunication

آخرین شماره | آرشیو

ISSN: 2322-1437 eISSN: 2345-2773

صاحب امتیاز:

جهاد دانشگاهی

مدیر مسئول:

مهندس حبیب الله اصغری

سردبیر:

دکتر مسعود شفیعی

تلفن نشریه: ۰۲۱-۸۸۹۳۰۱۵۰

اطلاعات بیشتر نشریه

درباره نشریه پیام به نشریه سایت اختصاصی نشریه پذیرش الکترونیکی مقاله راهنمای نویسندگان

به جمع مشترکان مگیران بپیوندید!

Density Measure in Context Clustering for Distributional Semantics of Word Sense Induction

Masood Ghayoomi*

Word Sense Induction , Word Embedding , Clustering , Silhouette Score , Unsupervised Machine Learning , Distributional Semantic , Density

Journal of Information Systems and Telecommunication

فصلنامه سیستم های اطلاعاتی و مخابرات