ICT English-Persian comparable textual corpus
In Computational Linguistics، corpus is a collection of written texts or spoken materials in machine-readable form، assembled for the purpose of studying linguistic structures، language changes over time as well as natural language processing projects. In this paper، we focus on designing a bilingual corpus. This corpus is made automatically and it consists of resources and documents in ICT domain. We developed a software framework for building textual corpus to reduce the cost and construction time. In addition، this software provides corpus management capabilities. We also proposed an alignment method for Persian-English ICT corpus. Our goal is to design an alignment system for the extraction of corresponding sentences. In this method، we deployed a bilingual dictionary and artificial intelligence techniques in order to calculate score representing the similarity between two sentences. then، we automatically map each pair of sentences in both languages.
- حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران میشود.
- پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانههای چاپی و دیجیتال را به کاربر نمیدهد.