The role of different types of homograph contexts in measuring documents similarities

Article Type:
Research/Original Article (دارای رتبه معتبر)
Abstract:
Aim: Automatic information retrieval is based on the assumption that texts contain content or structural elements that can be used in word sense disambiguation and thereby improving the effectiveness of the results retrieved. Homographs are among the words requiring sense disambiguation. Depending on their roles and positions in texts, homograph contexts could be divided to different types, with probably different potency in determination of similarity of documents. Using a content analysis method, the present research aims to compare the powers of five kinds of contexts including text citations, references, reference titles, paper titles and texts in homograph sense disambiguation.
Methodology
Applying a content analysis method, the present paper concentrates on a document test collection built on English homographs by choosing a sample consisted of 3637 articles containing 19 homographs about 54 subjects published during 2000-2015. Discriminant analysis was used to determine the similarity within or differentiation between the 54 document clusters.
Findings: According to the results of the discriminant analyses carried out within each of the clusters, sub-clusters of documents can be observed, though with a very little differentiation in terms of the homograph contexts. Text-citation and reference contexts are revealed to have minimum role in differentiating between the documents within the clusters.
Conclusion
Documents containing synonymous homographs form clusters within which documents are rather similar in terms of their homograph contexts. Furthermore, homograph context types are not equal in their power to determine similarities. Text-citation context and reference context types showed the highest degree of similarities within the clusters. These two context types, which show high similarity within clusters, can be used to improve retrieval results. It is suggested that the results of the comparison of these two contexts can be used as a tool for secondary ranking or clustering of information retrieval
Results

Originality: This is the first research, of its kind, to define different text contexts and compare them in terms of their power to determine similarity of texts containing synonymous homographs.
Language:
Persian
Published:
Journal of Information Processing and Management, Volume:33 Issue: 3, 2018
Pages:
1183 to 1206
magiran.com/p1839919  
دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:
اشتراک شخصی
با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!
اشتراک سازمانی
به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!
توجه!
  • حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
  • پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.
In order to view content subscription is required

Personal subscription
Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.
Organization subscription
Please contact us to subscribe your university or library for unlimited access!