فهرست مطالب

Mathematical Linguistics - Volume:1 Issue: 1, Sep 2015

Mathematical Linguistics
Volume:1 Issue: 1, Sep 2015

  • 94 صفحه،
  • تاریخ انتشار: 1394/07/07
  • تعداد عناوین: 8
|
  • Karl, Heinz Best, Gabriel Altmann Page 1
    The article revises the results obtained by A. Boshtan and K.-H. Best for the distribution of simple attributes in texts of three German writers. A comparison of texts is performed using a non-parametric text and it may be conjectured that the given result can be generalized to German fiction. A series of further problem associated with the use of attributes is listed.
  • Farhad Bahadori, Jahromi Page 9
    In this paper, we focus on methods of coupling the semi-supervised learning of information extractors that extract information (e.g., City(X) and AthletePlaysForTeam (X, Y)) from free text using textual extraction patterns (e.g., “mayor of X” and “Y star quarterback X”). We identify three general types of coupling among target functions that can be combined to form a dense network of coupled learning problems. We then present an approach in which the input to the learner is an ontology defining a set of target categories and relations to be learned, a handful of seed examples for each, and a set of constraints that couple the various categories and relations (e.g., Person and Sport are mutually exclusive). We show that given this input and millions of unlabeled documents, a semi-supervised learning procedure can achieve very significant accuracy improvements by coupling the training of textual pattern-based extractors for dozens of categories and relations. Based on results reported here, we hypothesize that even greater accuracy improvements will be possible by forming a larger and denser network of inter-constrained learning tasks. The main research contributions of the paper are: (1) this work is the first to couple the simultaneous semi supervised training of both category and relation textual pattern-based extractors and (2) this work proposes that learning many tasks and coupling them as much as possible leads to higher accuracy semi-supervised learning, and provides web-scale experimental evidence to support that point.
  • Denys Ishutin, Sonja Babic, Hanna Gnatchuk Page 25
    The present investigation deals with the study and comparison of anglicisms in the German, Serbian and Ukrainian languages. The focus of our attention is on the analysis of the Austrian newspaper “Kleine Zeitung”, the Serbian newspaper “Somborske Novine” and the Ukrainian newspaper “Holos Ukrajinu”. The aim of the given investigation consists in observing and comparing the tendency towards the development of anglicisms from 1995 till 2015 with the help of Piotrowski law. In order to achieve the objective, we have analyzed one issue of the newspapers under consideration. As a result, 60 newspapers have been investigated.
  • Tayebeh Mosavi Miangah, Mohammad Javad Rezai Page 43
    One of the main applications of monolingual corpora can be seen in developing automatic spell checking systems. In such systems, a large monolingual corpus can function as a database instead of a monolingual dictionary. In this study it has been tried to demonstrate the effectiveness of a very large monolingual corpus of Persian in improving the output quality of a spell checker developed for this language. In the present spelling correction system the three phases of error detection, making suggestions, and ranking suggestions are to be performed in three separate stages. The experiment carried out to evaluate the performance of the spell checking system demonstrates that it works very well on detection Persian erroneous words though it is not very precise on ranking candidates. Determined efforts will be taken in near future to deal with this latter problem using some improvements in tokenization of the system as well as taking the context into account.
  • Jan Macutek, Barbora Melicherova Page 57
    Distances between words of equal length in Ukrainian texts are analyzed. Word length is measured in syllables. Data are pooled and then modeled by the Gross-Harris geometric distribution. One of two parameters of the distribution is used as a characteristic of texts (as words consisting of one, two, three, and four syllables are considered, each text is characterized by four values). Cluster analysis is applied to the values. Resulting clusters can contribute to an automatic text classification.
  • Masoumeh Shiri, Somayyeh Eslahi, Elham Torkaman Page 71
    The present study aimed to investigate the difference between the quality of "Google Translate" and human translations of sayings based on the model of Automatic Language processing Advisory Committee (ALPAC). This qualitative study was conducted using three translation MA students and a "Google Translate" program as Machine Translation (MT). In this context, 101 English sayings were selected with translations of them by the researchers. Three raters with MA in Translation Studies courses into were evaluated fidelity of translated sayings of MT and three human translators. The results show that the quality of the translations (in this research; fidelity) of "Google Translate" is lower than the quality of the translation of sayings by human.
  • Hanna Gnatchuk Page 81
    In the given investigation we deal with the analysis of English compounds in the American newspaper “The New York Times”. In such a way, we focus on the parts of speech (which make up the compound) and the cohesion of the compounds. The material of our research consists of one issue of the American newspaper “The New York Times” (Monday, 2. February 2015). The English compounds have been analyzed on each page. The results have been statistically processed.
  • Emmerich, Kelih Page 91