Correction and Improvement of the Common Processes in Optical Character Recognition (OCR) of Persian Texts: Using the Features of the Persian Script and a Dimension Transference Algorithm
Since the technology of optical recognition of characters is essentially based on Latin script, almost all the algorithms and processes involved in the Persian OCR systems are constructed upon the structure and scriptological features of Latin alphabet. This utilization of the means and features of Latin script in order to design Persian-based OCR systems, however, not only has not resulted in the appropriate optical recognition of Persian characters but also has simultaneously ended in confusion on the part of both the Persian-speaking users and the systems. Through a step by step discussion and analysis of the processes involved in the optical recognition of characters based on the scriptological features of the Persian script, not only the deficiencies and faults of the current Latin-based OCR systems will be pinpointed but also a different aspect of the Persian writing system, in connection with its use in computer software, especially OCR systems, will be drawn so that the reader will practically notice the potentials and capabilities of this complex script in contrast to the simpler Latin writing system. In the end, in order to upgrade and improve the current algorithms employed in Persian OCR systems, the geometrical process of transferring bi-dimensional specifications into mono-dimensional ones has been utilized. The proposed algorithm, which is based on the scriptological features of the Persian script, will simultaneously result in the convenient manipulation of patterns, reduction of the bulk of the database, and acceleration of the data processing rate.
- حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران میشود.
- پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانههای چاپی و دیجیتال را به کاربر نمیدهد.