فهرست مطالب

International Journal of Language Testing
Volume:4 Issue: 2, Oct 2014

  • تاریخ انتشار: 1394/07/20
  • تعداد عناوین: 4
|
  • Paul Joyce* Page 155
    This paper reports upon the development of a test of second language (L3) connected speech comprehension. Despite the importance of connected speech to L3 listening comprehension, there remains the absence of a theoretically and empirically sound means of measuring learners‟ understanding of it. Thus, the development of the reduced forms test was undertaken to address this need. The assessed material was delivered through a dictation that contained a wide range of frequently occurring reduced forms. To ensure the trait purity of the instrument, the dictation consisted of a series of short decontextualised sentences that were of great lexical and syntactic simplicity. The test underwent piloting with Japanese university students who were from a false beginner to upper intermediate proficiency level. During the test development, both the Classical Testing Theory and Item Response Theory approaches to test evaluation and item selection were utilised. The second version of the connected speech dictation test was administered to 655 participants. The findings showed that all of the items fitted the Rasch model and, therefore, the test is considered a valid measure of reduced forms L3 English listening. Furthermore, the results indicated that Japanese L3 learners have difficulty recognising even the most frequently used English words when they are spoken in fluent native speaker discourse. It was concluded that the teaching of reduced forms should constitute a more important part of the L3 listening curriculum.
    Keywords: Second language, listening, connected speech, reduced forms, test development, testing
  • Nasser Rashidi*, Faezeh Safari Page 175
    With the widespread use of multiple-choice (MC) tests, even if they were disapproved by many practitioners, investigating the performance of such tests and their consequent features is desirable. The focus of this study was on a modified version of multiple-choice test, known as multitrak. The study compared the multitrak test scores of about 71 students against those of the standard MC and constructed-response (CR) tests. The tests employed in the study evaluated English language grammar while they all had identical worded stems. The results showed that multitrak items are at a higher level of difficulty in comparison to the other formats. The results suggest that these items can be used to test more advanced aspects of grammatical competence as the test taker requires going beyond mere syntactic knowledge to be competent in the range of alternatives being used in communication to find the unacceptable choice. Therefore, multitrak test is better geared for higher levels of proficiency and could provide better information about test takers who are more proficient. At the end, implications of the study for test constructors and test users, as well as implications for future research, are discussed.
    Keywords: Multitrak test, standard multiple, choice test, constructed, response test, testing grammar, test format
  • Seyed Mohammad Reza Amirian*, Seyed Mohammad Alavi, Angel M. Fidalgo Page 187
    The aim of the present study is twofold. First, the paper investigated whether University of Tehran English Proficiency Test (UTEPT) manifested substantial gender Differential Item Functioning (DIF). Second, the flagged DIF items were subjected to a content analysis to determine underlying sources of DIF. Mantel-Haenszel (MH) and Logistic Regression (LR) as two popular methods of DIF detection were employed to analyze the data obtained from 0331 test takers in 0101. The findings indicated that even though 082 of items were initially detected by MH and LR as displaying gender DIF, the effect size of DIF was mostly negligible. Moreover, the content analysis phase of the study showed that sometimes it is difficult to hypothesize the linguistic element causing DIF in items. However, humanities-oriented subjects were rated as favoring females and science-oriented subjects were rated as favoring males. Finally, a correlation index of. 01 manifested that MH and LR produce highly consistent DIF results. These findings are discussed and implications for test developers and DIF researchers are provided.
    Keywords: Fairness, DIF, Uniform DIF, Non, Uniform DIF, MH, LR, UTEPT
  • Nasim Ghanbari*, Hossein Barati Page 204
    The present study explores the practice of Iranian raters in the EFL writing assessment context of the country. For this aim, early in the study a questionnaire developed by the researcher was administered to thirty experienced raters from ten major state universities in Iran. Later, five of the raters were chosen to participate in a follow-up think aloud session aimed to further explore the rating process. Findings of the questionnaire casted doubt on the existence of an explicit rating scale in the context. The following think aloud protocols further revealed that despite the apparent superiority of the raters in the rating process, neither the raters nor the rating scale-as two central components of the performance assessment- had a real agency in the Iranian EFL rating context. Lack of a common rating scale caused the raters to draw on ad hoc rating criteria and raters‟ idiosyncratic practices resulted by a missing rater training component created a context in which the construct of writing ability is greatly underrepresented. Along with locating the sources of the issue in both the rating scale and the raters, this study emphasizes that Iranian raters are in urgent need of training and constant monitoring programs to acquire rating expertise over time. As a requirement for the training programs, development of an explicit rating scale is strongly encouraged in the context.
    Keywords: EFL writing assessment, Rating scale, Rater, Validation, Think aloud protocols