ICT English-Persian comparable textual corpus

Author(s):

Shokoofe Dashtbani , Muharram Mansoorizadeh , Mohammad Nassiri

Message:

Abstract:

In Computational Linguistics، corpus is a collection of written texts or spoken materials in machine-readable form، assembled for the purpose of studying linguistic structures، language changes over time as well as natural language processing projects. In this paper، we focus on designing a bilingual corpus. This corpus is made automatically and it consists of resources and documents in ICT domain. We developed a software framework for building textual corpus to reduce the cost and construction time. In addition، this software provides corpus management capabilities. We also proposed an alignment method for Persian-English ICT corpus. Our goal is to design an alignment system for the extraction of corresponding sentences. In this method، we deployed a bilingual dictionary and artificial intelligence techniques in order to calculate score representing the similarity between two sentences. then، we automatically map each pair of sentences in both languages.

Keywords:

Computational linguistics , corpus management , sentence alignment , comprable corpus , longest common sub , sequence (LCS)

Language:

Persian

Published:

Journal of Comparative Linguistic Researches, Volume:4 Issue: 8, 2015

Pages:

121 to 141

magiran.com/p1369486

دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:

اشتراک شخصی

با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!

اشتراک سازمانی

به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!

اطلاعات بیشتر

توجه!

حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.

دسترسی سراسری کاربران دانشگاه پیام نور!

اعضای هیئت علمی و دانشجویان دانشگاه پیام نور در سراسر کشور، در صورت ثبت نام با ایمیل دانشگاهی، تا پایان فروردین ماه 1403 به مقالات سایت دسترسی خواهند داشت!

In order to view content subscription is required

Personal subscription

Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.

Organization subscription

Please contact us to subscribe your university or library for unlimited access!

More information

علمی مصوب

نشریه پژوهش های زبان شناسی تطبیقی

Journal of Comparative Linguistic Researches

دوفصلنامه علوم انسانی

آخرین شماره | آرشیو

ISSN: 2252-0740 eISSN: 2322-4975

صاحب امتیاز:

دانشگاه بوعلی سینا

مدیر مسئول:

دکتر مهرداد نغزگوی کهن

سردبیر:

دکتر محمد راسخ مهند

تلفن نشریه: ۰۸۱-۳۸۳۸۱۱۹۲

اطلاعات بیشتر نشریه

درباره نشریه پیام به نشریه سایت اختصاصی نشریه پذیرش الکترونیکی مقاله

سامانه نویسندگان

Corresponding Author (2)

Mansoorizadeh, Muharram

Associate Professor, Computer engineering, Bu-Ali Sina University

اطلاعات نویسنده(گان) توسط ایشان ثبت و تکمیل شده‌است. برای مشاهده مشخصات و فهرست همه مطالب، صفحه رزومه را ببینید.

به جمع مشترکان مگیران بپیوندید!

ICT English-Persian comparable textual corpus

Shokoofe Dashtbani , Muharram Mansoorizadeh , Mohammad Nassiri

Computational linguistics , corpus management , sentence alignment , comprable corpus , longest common sub , sequence (LCS)

نشریه پژوهش های زبان شناسی تطبیقی

Journal of Comparative Linguistic Researches