Improving the output quality of spell checkers using large linguistic corpora

Message:
Abstract:
One of the main applications of monolingual corpora can be seen in developing automatic spell checking systems. In such systems, a large monolingual corpus can function as a database instead of a monolingual dictionary. In this study it has been tried to demonstrate the effectiveness of a very large monolingual corpus of Persian in improving the output quality of a spell checker developed for this language. In the present spelling correction system the three phases of error detection, making suggestions, and ranking suggestions are to be performed in three separate stages. The experiment carried out to evaluate the performance of the spell checking system demonstrates that it works very well on detection Persian erroneous words though it is not very precise on ranking candidates. Determined efforts will be taken in near future to deal with this latter problem using some improvements in tokenization of the system as well as taking the context into account.
Language:
English
Published:
Mathematical Linguistics, Volume:1 Issue: 1, Sep 2015
Page:
43
https://www.magiran.com/p1445452