Magiran | فصلنامه پازند، پیاپی 30 (پاییز 1391)

انتخاب همه

تفاوت گونه نوشتاری مردانه و زنانه در گزیده ای از داستان های انگلیسی: پژوهشی در زبان شناسی رایانشی

بهروز محمودی بختیاری*، علی فارسی نژاد صفحه 5

امروزه در نتیجه مطالعات سازمان یافته بر پیکره های گوناگون زبانی، جامعهشناسان زبان و متخصصان تحلیل گفتمان بر این باورند که سبک و نحوه استفاده زبان در زنان و مردان متفاوت است. با وجود این، هنوز تفاوت متون داستانی نویسندگان زن و مرد به طور دقیق به لحاظ علمی و آماری تحلیل نشده است. در این مقاله به منظور دستیابی به نتایج آماری و استفاده از رایانه در تحلیل سبکی نگارش مردان و زنان، ابتدا پیکره کوچکی از داستانهای نویسندگان زن و مرد (به زبان انگلیسی) تهیه شد. سپس این پیکره در دو سطح عمومی/ واژگانی و مقوله دستوری، با استفاده از روش های پردازش زبان طبیعی، از شمارش ساده واحدهای زبانی تا روش های آماری، واکاوی شد. نتایج این پژوهش نشان می دهد که از میان شاخص های معرفی شده برای مقایسه سبک نوشتاری زنان و مردان، نویسندگان زن از کلمات مربوط به خانواده، کلمات منفی و حرف اضافه«for»استفاده بیشتری می کنند. این مقاله، گام نخست در مسیری است که امید می رود با تهیه پیکره های داستانی فارسی، در مطالعات سبک شناسی آثار معاصر ایران نیز پیگیری شود.

کلیدواژگان: سبک شناسی، زبان و جنسیت، زبان شناسی رایانشی، رابین لیکاف، رمان انگلیسی

چکیده مشاهده متن مقاله پژوهشی/اصیل زبان: فارسی
استخراج جملات موازی از دادگان وب

نسرین براتعلی پور*، هشام فیلی، آزاده شاکری صفحه 21

پیکره‌های موازی یکی از منابع با ارزش در بسیاری از کاربردهای پردازش زبان طبیعی و همچنین بازیابی هوشمند اطلاعات بین‌زبانی است. لازمه استفاده از این پیکره‌ها هم‌ترازی آنها در سطح جمله است، اما جمع‌آوری و یا تولید این پیکره‌ها و همچنین هم‌ترازی آنها بسیار پرهزینه است. با توجه به گستردگی و قابلیت دسترسی رایگان صفحات وب دوزبانه، جمع‌آوری پیکره‌های موازی از وب و هم‌ترازی آنها به صورت خودکار بسیار مطلوب است. در این مقاله برای تولید جملات موازی، ابتدا صفحات وب حاوی جملات موازی انتخاب، سپس ویژگی های هر زوج جمله فارسی-انگلیسی در این صفحات محاسبه و در نهایت به کمک طبقه‌بند بیشترین پراکندگی جملات موازی استخراج می‌شود. یکی از ویژگی‌های جملات استخراج شده، وابسته نبودن به دامنه و امکان پوشش حوزه‌‌های متفاوت معنایی است.

کلیدواژگان: پیکره موازی، هم ترازی متون، داده کاوی وب

چکیده مشاهده متن مقاله پژوهشی/اصیل زبان: فارسی
درک معنا در سامانه محاوره مبتنی بر متن برای حوزه ی ذخیره بلیت

پریا جمشیدلو*، محمد بحرانی صفحه 37

درک زبان محاوره حوزه خاصی از درک زبان طبیعی را شامل می‌شود که در آن جملات بیان‌شده توسط کاربر به اندازه جملات زبان نوشتاری تابع دستور زبان نیستند. در این مقاله، سامانه محاوره مبتنی بر متن برای استخراج معنای جملات محاوره‎ای مربوط به حوزه ذخیره بلیت معرفی می شود. در طراحی این سامانه از شیوه‌های مبتنی بر داده استفاده شده است. معماری آن شامل دو بخش اصلی استخراج متغیرها و انتساب محتمل‌ترین برچسب‌های معنایی به دنباله‌ای از کلمات است. برای این کار از الگوی مخفی مارکوف استفاده می شود. برچسب‌زنی معنایی دنباله کلمات با استفاده از الگوریتم ویتربی صورت می‌گیرد. بدین منظور، ابتدا پیکره‌ای از جملات مورد استفاده در حوزه ذخیره بلیت جمع‌آوری و سپس به هر کلمه یا ترکیبی از کلمات یک برچسب معنایی تخصیص داده می شود. در مرحله آموزش با استفاده از پیکره برچسب‌خورده، دنباله برچسب‌های ممکن برای توالی کلمات مختلف یاد گرفته می‌شود. در مرحله آزمون با استفاده از احتمالات استخراج‌شده از مرحله آموزش، محتمل‌ترین برچسب معنایی برای هر کلمه یا ترکیبی از کلمات پیدا می‌شود. بر اساس آزمایش‌های انجام‌شده، دقت سامانه پیشنهادی در تشخیص سه برچسب کلیدی مبدا، مقصد و تاریخ 91 درصد است.

کلیدواژگان: درک معنا، سامانه محاوره ای، روش مبتنی بر داده، الگوی مخفی مارکوف، الگوریتم ویتربی

چکیده مشاهده متن مقاله پژوهشی/اصیل زبان: فارسی
استخراج روابط معنایی میان فعل و وابسته های آن از متون زبان فارسی

مهرنوش شمس فرد*، فاطمه جعفری نژاد صفحه 53

به دست آوردن روابط معنایی میان افعال و دیگر اجزای سازنده جمله برای پردازش معنایی جمله کاربرد بسیاری دارد. به علاوه اطلاع از محدودیت‌های گزینشی که فعل به وابسته‌های خود اعمال می کند نیز در پردازش معنایی کاربرد دارد. علی رغم اینکه تلاش برای این کار در زبان‌های مختلف در حال پیگیری است، فراهم ساختن چنین اطلاعاتی برای افعال به صورت دستی مستلزم صرف هزینه های انسانی و زمانی است. در نتیجه خودکارسازی این روند بسیار با اهمیت و مورد توجه پژوهشگران است. در این مقاله سه روش برای استخراج این روابط معنایی ارایه می شود. روش مبتنی بر ریخت شناسی و تحلیل های لغوی به صورت ساده شده ای به حل مسئله می پردازد. روش مبتنی بر تعمیم، با بررسی آماری وابسته‌های افعال به محدودیت‌های گزینشی دست می یابد؛ و در روش مبتنی بر قاعده و تعمیم، برچسب‌زنی نقش‌های معنایی و یافتن محدودیت‌های گزینشی افعال به یافتن وابسته ها منجر می شود. در انتها این روش ها با هم مقایسه و مزایا و معایب هر یک بررسی می شود.

کلیدواژگان: پردازش زبان طبیعی، تحلیل معنایی سطحی، استخراج نقش های موضوعی، استخراج محدودیت های گزینشی افعال و وابسته ها

چکیده مشاهده متن مقاله پژوهشی/اصیل زبان: فارسی
سایه نما: سامانه درک استعاره با استفاده از ویژگی های معنایی مشترک بین دو واژه موجود در بافت استعاری

هادی عبدی قویدل، افشین رحیمی، پروانه خسروی زاده صفحه 73

مقاله حاضر به معرفی سامانه‌ای با عنوان سایه‌نما می‌پردازد. ایده کلی این سامانه که در زمینه استعاره برای زبان فارسی است برای نخستین بار طرح و پیاده‌سازی می‌شود. سایه‌نما با هدف کمک به درک استعاره به صورت خودکار، عملیات یافتن ویژگی‌های معنایی مشترک بین دو واژه‌ای که در بافت استعاری حضور دارند انجام می‌دهد. فرایند کلی سایه‌نما بدین ترتیب است که نخست ویژگی های معنایی مشترک بر اساس اطلاعات نقطه‌ای مشترک در هم رخدادی مرتبه دوم استخراج و در پی آن صفاتی معرفی می‌شوند تا بتوانند این نوع ویژگی‌ها را توصیف کنند. کاربرد این روش در سامانه های درک متن و به خصوص درک استعاره به صورت خودکار، کارایی سامانه را به طور قابل توجهی بهبود می‌دهد.

کلیدواژگان: سامانه سایه نما، ویژگی های معنایی، اطلاعات نقطه ای، هم رخدادی مرتبه دوم

چکیده مشاهده متن مقاله پژوهشی/اصیل زبان: فارسی
بررسی پیکره بنیاد واژه های هم معنی

شهرام مدرس خیابانی صفحه 85

چرچ و دیگران (1991: 12-13) با معرفی برخی ابزار آماری همچون «آزمون اطلاعات دوسویه» و «آزمون تی»، اهمیت چنین ابزاری را در تحلیل‌های زبان‌شناختی نشان می‌دهند. از سوی دیگر لاینز (1995: 62) تفاوت در باهم‌آیندهای دو واژه «big» و «large» را از دلایل نبود هم‌معنایی مطلق میان این دو واژه برمی‌شمارد. در این مقاله سعی بر آن است تا با استفاده از دو ابزار ذکرشده، ضمن اشاره به اهمیت پیکره‌های زبانی و ابزار آماری در پژوهش‌های زبان‌شناختی، تفاوت واژه‌های هم معنی از منظر باهم‌آیی بررسی شود.

کلیدواژگان: آزمون تی، آزمون اطلاعات دوسویه، هم معنایی، باهم آیی، پیکره زبانی

چکیده مشاهده متن مقاله پژوهشی/اصیل زبان: فارسی
مقایسه ساخت آوایی گونه رسمی و محاور های زبان فارسی

وحید مواجی*، محرم اسلامی صفحه 107

گونه رسمی و گونه محاوره‌ای زبان‌ها غالبا تفاوت هایی‌ با هم دارند و این تفاوت ها در همه سطح‌های زبانی دیده می‌شود. میزان تفاوت بین گونه رسمی و گونه محاوره‌ای، که گاهی از آنها با عنوان تفاوت گفتار و نوشتار یاد می‌شود، از زبانی به زبان دیگر متفاوت است. زبان فارسی از جمله زبان‌هایی است که در آن تفاوت گونه رسمی و گونه محاوره‌ای بسیار زیاد است. در این تحقیق تفاوت‌های آوایی یا به عبارتی فرایندهای آوایی ای بررسی می‌شود که در زبان فارسی در تبدیل گونه رسمی به گونه محاوره‌ای رخ می‌دهد. پیکره پژوهش حاضر دادگان گفتاری «فارسدات تلفنی» زبان فارسی(بی جن خان و همکاران، 2003) است که در آن گفتار پیوسته در دو سطح واجی و آوایی در قالب دو زنجیره مستقل برچسب خورده است. هم‌گذاری این دو رشته از داده‌ها روشن می‌سازد که در مقایسه این دو گونه زبانی کدام فرایندهای آوایی در تبدیل زنجیره واجی به زنجیره آوایی دخیل اند. در انطباق دو رشته واجی و آوایی از الگوریتم لونشتاین استفاده می شود که مناسب و رایج در انطباق تقریبی رشته‌های متفاوت جهت یافتن فاصله بین آنها است. در نتیجه تفاوت دو رشته واجی و آوایی به‌ صورت آماری به ‌دست می آید. از نتایج این پژوهش می‌توان به لحاظ نظری در توصیف‌های زبان‌شناختی درباره نظام آوایی زبان فارسی، تهیه منابع محاوره‌ای زبان فارسی و آموزش زبان فارسی به‌ خصوص به غیرفارسی‌زبانان‌ سود جست. از سوی دیگر در فن آوری‌های گفتار مانند بازشناسی و بازسازی گفتار، استخراج اطلاعات از متن‌های محاوره‌ای، تبدیل متن به زنجیره واجی گونه محاوره‌ای زبان فارسی و امکان تبدیل آن به گونه رسمی می‌توان از نتایج این تحقیق استفاده کرد.

کلیدواژگان: ساخت آوایی، گونه رسمی، گونه محاوره ای، الگوریتم لونشتاین، فارسدات تلفنی فارسی

چکیده مشاهده متن مقاله پژوهشی/اصیل زبان: فارسی

انتخاب همه

A Study of Differences in the Writings of Male and Female English Novel Writers: A Computational Linguistics Approach

Behrooz Mahmoodi, Bakhtiari*, Ali Farsi Nejad Page 5

Today, thanks to the systematic studies on the several linguistic corpora, most of the sociolinguists and discourse analysts believe that there are remarkable differences in the style and use of the language by men and women. However, such differences have not yet been analyzed thoroughly and statistically on the fictional prose of male and female writers. In the present article, computational approaches are employed to reach a stylistic objective on the gender-based differences between the use of language in several important novels written in English. First, a small corpus of some major English novels was formed. Then this corpus was analyzed on the basis of NLP from simple counting of the linguistic units, up to the more complicated statistical methods. The results of this research reveal that from the issues cited for the features of male and female's writings, female writers have made more use of the words about family, negative particles, and the preposition “for”. It I hoped that this article paves the way for similar studies on the Persian fiction analysis through corpus linguistics methods.

Keywords: Stylistics, language, gender, computational linguistics, Robin Lakoff, Novels in English

Abstract View Paper Research/Original Article Original: Persian
Extracting Parallel Sentences from the Web

Nasrin Bratalipur*, Hosham Faili, Azade Shakeri Page 21

Parallel corpora regard as rich linguistic resources for Natural Language Processing and Cross Language Information Retrieval tasks. It is usually needed to align sentences before using these valuable resources; however, sentence alignments are expensive in terms of time and cost. With development of the World Wide Web and free access to it, automatically building parallel corpus from the Web is desirable. In this paper, we first choose bilingual pages with parallel content to extract parallel sentence candidates. Then, by computing several features and learning a Maximum Entropy classifier, parallel sentences are extracted from the candidate sentences. Our approach is not dependent on specific domain and it can cover different domains in the Web.

Keywords: Parallel corpus, align sentence, web processing

Abstract View Paper Research/Original Article Original: Persian
Understanding Meaning in a Text-Based Dialogue System for Specific Domain of Ticket Reservation

Paria Jamshidlou*, Mohammad Bahrani Page 37

Spoken language understanding is considered as a specific domain of natural language understanding in which the uttered sentences are not as well-formed as written sentences. In the present paper, a text-based system of spoken language understanding is introduced for ticket reservation domain. This system is developed according to the datadriven approach and its architecture includes two main parts: first, extracting parameters of the model and second, assigning the most likely semantic tags to the sequence of words. "Hidden Markov Model" and "Viterbi" algorithm are applied in order to train the parameters and to tag the sequence of words. For this purpose, a corpus of commonly-used sentences in ticket reservation domain is collected and a specific tag is assigned to each word or a combination of words. In the training step, by using the tagged corpus, a sequence of possible tags is learned for a sequence of various words and in the testing step the most likely tag is assigned to a word or a combination of words according to the probabilities calculated in the previous step. Evaluation of the accuracy of system in recognizing the three key tags of departure, arrival and date is 91%.

Keywords: natural language understanding, spoken dialogue system, data, driven approach, Hidden Markov Model, Viterbi algorithm

Abstract View Paper Research/Original Article Original: Persian
Extracting Semantic Relations between Verbs and their Arguments from Persian Texts

Mehrnush Shams Fard*, Fatemeh Jafarinejad Page 53

Extracting semantic relations between the verb and its arguments in a sentence is useful for many natural language processing applications. On the other hand the selection restrictions which a verb applies on its arguments can be used in semantic processing of texts. Manual extraction of the argument structure of verbs besides the selection restrictions of all arguments of all verbs is very time consuming as well as costly. Thus automation of this task is one of the interests of researchers in semantic text processing field. In this paper, we propose three approaches to extract semantic relations between the verb and its arguments in a sentence. The first and the simplest one is based on the morphology and the lexical analysis of words. The second approach is based on generalization and extracts the selection restrictions by statistical study of the arguments. The third approach is based on rules and generalization which labels the semantic roles besides extracting the selection restrictions. After explaining the approaches, we compare them and discuss their pros and cons.

Keywords: natural language processing, Shallow semantic parsing, extracting Semantic Relations, extracting the selection restrictions of verbs

Abstract View Paper Research/Original Article Original: Persian
"Sayeh-nama": A System for Understanding Metaphors Using the Shared Semantic Features of Term Pairs in the Metaphorical Contexts

Hadi Abdi Ghavidel, Afshin Rahimi, Parvaneh Khosravizadeh Page 73

This paper introduces a system named “Sayeh-nama”. The overall idea is that such system is implemented for the first time in the field of metaphor for Persian. Sayeh-nama, with the purpose of helping the automatic understanding of metaphors, finds the shared semantic features of two terms in the metaphorical contexts. The overall process of this system consists of two phases. Firstly, the shared semantic features are extracted based on the second-order co-occurrence point-wise mutual information and then a number of adjectives are offered to describe these features. This method improves the performance of the system significantly in the systems of texts understanding, especially automatic understanding of the metaphors.

Keywords: Sayeh, nama, semantic features, second, order cooccurrence, point, wise mutual information

Abstract View Paper Research/Original Article Original: Persian
A Corpus-Based Analysis of Synonyms

Shahram Modarres Khiabani Page 85

Church et al. (1991:12-13) have introduced some statistical measures such as Mutual Information (MI) test and t-score to identify significant lexical relations, especially to estimate associations between two words. On the other hand, Lyons (1995:62) has mentioned that the lack of absolute synonymy between the synonymous pair ‘big’ and ‘large’ is caused by the difference in their collocates. This paper intends to study the differences between synonymous pairs in terms of collocations using these two mentioned tests whilst mentioning the significance of linguistic corpora and the importance of statistical measures on linguistic studies.

Keywords: T, score, Mutual Information (MI) test, synonymy, collocation, linguistic corpora

Abstract View Paper Research/Original Article Original: Persian
A Comparative Study of Phonetic Structure of Formal and Collquial Persian

Vahid Mavaji*, Moharram Eslami Page 107

There are differences between formal and colloquial varieties of languages in all aspects. The degree of the differences between formal and colloquial varieties is not similar in all languages. Persian is one of those languages in which the differences between formal and colloquial varieties are remarkable. This study investigates the phonological processes which turn the segmental string of formal Persian into colloquial one using the telephonic speech database, T-Farsdat in which the continuous speech has been segmented and annotated in two phonemic and phonetic levels. The alignment of these two strings gives us the type of phonological processes active in changing formal into colloquial Persian. Levenshtein Algorithm was used in aligning the phonemic and phonetic strings in order to show the type and frequency of the phonetic differences in formal and colloquial varieties of the language. The results of this study can be used in different aspects of theoretical study of the language and also in developing technologies for the language.

Keywords: phonetic structure, formal variety, colloquial variety, Levenshtein Algorithm, T, Farsdat

Abstract View Paper Research/Original Article Original: Persian

به جمع مشترکان مگیران بپیوندید!

فهرست مطالب

فصلنامه پازند
پیاپی 30 (پاییز 1391)

فصلنامه پازند

Journal of Pazand

به جمع مشترکان مگیران بپیوندید!

فهرست مطالب

فصلنامه پازند پیاپی 30 (پاییز 1391)

فصلنامه پازند

Journal of Pazand

فصلنامه پازند
پیاپی 30 (پاییز 1391)