Classification and recognition of sub-words in old Persian manuscript documents
Historical documents are always of interest to historians and linguists. Important documents are usually digitized by segmentation and identification methods. Digitization of documents is very important for research on these documents and their protection. This article proposes a general classification and recognition framework for the images of Persian historical documents. First, pre-processing is done on documents by removing noises, removing skew, removing stamps, etc., and the document image becomes a two-level image. In the second step, a method of dividing the text of the document into lines is proposed. In the third stage, a method of dividing lines into sub-words of Persian script is presented and the sub-words of these documents are extracted, then deep networks are used to train frequent sub-words and recognize them, and the results are reported based on different criteria.
- حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران میشود.
- پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانههای چاپی و دیجیتال را به کاربر نمیدهد.