Written language Classification in multilingual documents

Author(s):

Raziye Khadivi Golkarzadeh* , Saied Sherbaf , Adel Ghazi Khani

Message:

Abstract:

Optical character recognition is one of the working areas in pattern recognition. Each year the conference papers related to the topic in artificial intelligence, pattern rec¬ognition, image processing, machine vision, and. .. Is presented. However, Due to the inherent complexity of languages in the world, still very interested in the subject mat¬ter want to identify the texts with better results. Researchers have presented Many algorithms to convert text images and non editable text into editable by the computer. Many articles say that the written language has its own characteristics, can only iden¬tify a document type that has one language. In view documents, there are several things that a document containing two or more different languages. Therefore, Docu¬ment identification systems require identification several languages simultaneously. In this study, we chose common language, then based on Physical Characteristics extracted from them, we present a text language classification algorithm for multi language document.Then we can extracted from this classes same features for char¬acter identification. Farsi and Arabic in class1, Chinese, Japanese and Korean in class2 and in English, Indonesian and Spanish are placed in Class 3. System must befor each line of the document, identify class it belongs to. The classifier used for classification is decision tree classifier structure with the daptive threshold levels.Surveydata are scanned document. The diagnosis is equal to 93.3 percent, which proves the effectiveness of the model presented.

Keywords:

Optical Character Recognition , Classification of written language , Feature Extraction , Scanned documents , Adaptive Decision Tree

Language:

Persian

Published:

Journal of Publishing, Volume:2 Issue: 5, 2013

Page:

https://www.magiran.com/p1363192

دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:

اشتراک شخصی

با ثبت ایمیلتان و پرداخت حق اشتراک سالانه به مبلغ 1,950,000 ريال، بلافاصله متن این مقاله را دریافت کنید.اعتبار دانلود 70 مقاله نیز در حساب کاربری شما لحاظ خواهد شد.

پرداخت حق اشتراک به معنای پذیرش "شرایط خدمات" پایگاه مگیران از سوی شماست.

پست الکترونیکی

اگر مقاله ای از شما در مگیران نمایه شده، برای استفاده از اعتبار اهدایی سامانه نویسندگان با ایمیل منتشرشده ثبت نام کنید. ثبت نام

اشتراک سازمانی

به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!

اطلاعات بیشتر ثبت نام با ایمیل دانشگاهی/سازمانی

توجه!

حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.

In order to view content subscription is required

Personal subscription

Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.

Organization subscription

Please contact us to subscribe your university or library for unlimited access!

More information

علمی-ترویجی

توقف انتشار

نشریه چاپ و نشر

Journal of Publishing

فصلنامه علوم انسانی

آخرین شماره | آرشیو

ISSN: 2322-1062

انتشار این نشریه متوقف شده‌است.

صاحب امتیاز:

دانشگاه بین المللی امام رضا

مدیر مسئول:

دکتر رجب اصغریان

سردبیر:

دکتر محمد خزایی

تلفن نشریه: ۰۵۱-۳۸۰۴۱ (داخلی 2255)

اطلاعات بیشتر نشریه

درباره نشریه

به جمع مشترکان مگیران بپیوندید!

Written language Classification in multilingual documents

Raziye Khadivi Golkarzadeh* , Saied Sherbaf , Adel Ghazi Khani

Optical Character Recognition , Classification of written language , Feature Extraction , Scanned documents , Adaptive Decision Tree

نشریه چاپ و نشر

Journal of Publishing