A comparison of estimated achivement scores obtained from student achievement assessment test utilizing classical test theory, unidimensional and multidimensional IRT<p>Öğrenci başarılarının belirlenmesi sınavından klasik test kuramı, tek ve çok boyutlu madde tepki kuramı modelleri ile kestirilen başarı puanlarının karşılaştırılması
Keywords:
Unidimensional and multidimensional item response theory, classical test theory, ability estimation, dimensionality, ÖBBS, Tek ve çok boyutlu madde tepki kuramı, klasik test kuramı, yetenek kestirimi, boyutluluk, ÖBBS.Abstract
The focus of this research is to test the estimation of achievement measurements in the test battery and to empirically compare the results after applying classical test theory, unidimensional and multidimensional item response theory models to Student Achievement Assessment Test (ÖBBS-2008) subtests of Turkish and Mathematics. It also tries to put forward the best model that estimates students’ achievement with less error as the comparison is being made. From the analysis of Turkish test's data results, it is identified that the ability parameters estimated obtained from the whole test under multidimensional IRT, have partially less error scores and reached more precise measurement than ability parameters estimated obtained from unidimensional IRT on the basis of sub dimensions and test scores obtained from CTT. Similar results were obtained in mathematics test results. Finally, it is found that parameters, obtained within the scope of multidimensional IRT, have partially less error scores.
Özet
Bu araştırmada, bir test bataryasındaki başarı ölçüleri kestiriminin doğruluğunun belirlenmesi ve ampirik olarak Klasik Test Kuramı (KTK), tek ve çok boyutlu Madde Tepki Kuramı (MTK) modellerinin Öğrenci Başarılarının Belirlenmesi Sınavı’nın (ÖBBS-2008) Türkçe ve matematik alt testi verilerine uygulanarak elde edilen başarı ölçülerinin karşılaştırılması amaçlanmıştır. Bu karşılaştırmalar yapılırken başarı ölçülerini daha az hata ile kestiren en iyi model ortaya konulmaya çalışılmıştır. Türkçe testi verilerinin analizi sonucunda tüm testten çok boyutlu MTK ile kestirilen yetenek parametrelerinin alt boyutlar bazında tek boyutlu MTK’ye göre kestirilen yetenek parametreleri ve KTK’ye göre elde edilen test puanlarına kıyasla kısmen daha düşük standart hataya sahip olduğu belirlenmiştir. Matematik testi verilerinin analizi sonucunda, yetenek parametrelerinin kestiriminde en düşük hatanın çok boyutlu MTK’ye göre; en yüksek hatanın ise matematik testinin alt boyutlarından tek boyutlu MTK ve tüm testten KTK’ye göre belirlenen puanlardan elde edildiği belirlenmiştir.
Downloads
Metrics
References
Ackerman, T.A. (1989). Unidimensional IRT Calibration of Compensatory and Non-Compensatory Multidimensional Items. Applied Psychological Measurement, 13, 113–127.
Ackerman, T. A. and Davey, T. C. (1991). Concurrent adaptive measurement of multiple abilities. Paper presented at the annualmeeting of the American Educational Research Association, Chicago.
Adams, R. J., Wilson, M., and Wang, W.C. (1997). The Multidimensional Random Coefficients Multinomial Logit Model. Applied Psychological Measurement, 21, 1–23.
Anderson, J.O. (1999). Does Complex Analysis (IRT) Pay Any Dividends in Achievement Testing?.The Alberta Journal of Educational Research, XLV,344-352.
Ansley, T.N. and Forsyth, R.A. (1985). An Examination of The Characteristics of Unidimensional IRT Parameter Estimates Derived from Two-Dimensional Data. Applied Psychological Measurement, 9, 37–48.
Baykul, Y. (2000). Eğitimde ve Psikolojide Ölçme: Klasik Test Teorisi ve Uygulanması. Ankara: ÖSYM Yayınları.
Bock, R. D., Thissen, D. and Zimowski, M. F. (1997). IRT Estimation of Domain Scores. Journal of Educational Measurement, 37(3), 197–211.
Chang, Y.W. (1992). A Comparison of Unidimensional and Multidimensional IRT Approaches to Test İnformation in a Test Battery. Unpublished doctoral dissertation, University of Minnesota.
Courville, T. G. (2005). An Empirical Comparison of Item Response Theory and Classical Test Theory Item/Person Statistics. Unpublished doctoral dissertation, Texas A&M University.
Çelen, Ü. (2008). Klasik Test Kuramı ve Madde Tepki Kuramına Dayalı Olarak Geliştirilen İki Testin Psikometrik Özelliklerinin Karşılaştırılması. Yayımlanmamış Doktora Tezi, Ankara Üniversitesi Eğitim Bilimleri Enstitüsü, Ankara.
de la Torre J. and Patz R. J.(2005). Making The Most of What We Have: A Practical Application of Multidimensional IRT in Test Scoring. Journal of Educational and Behavioral Statistics, 30, 295–311.
Demirtaşlı, Ç.N. (2002). A Study Of Raven Stndart Progressive Matrices Tests’ Item Measures Under Clasic and Item Response Models: An Empirical Comparison. Ankara University, Journal of Faculty of Educational Sciences, 35, 1-2.
Drasgow, F. and Parsons, C.K. (1983). Application of Unidimensional Item Response Theory Models to Multidimensional Data. Applied Psychological Measurement,7,189–199.
Elias, S., Hattie, J., and Douglas, G. (1998). An Assessment of Various Item Response Model and Structural Equation Model Fit İndices to Detect Unidimensionality. Paper presented at the annual meeting of the National Council on Measurement in Education, San Diego, CA.
Embretson, S.E. and Reise, S.P. (2000). Item Response Theory for Psychologists. New Jersey: Lawrence Erlbaum Associates.
Fan, X. (1998). Item Response Theory and Classical Test Theory: An Empirical Comparison of Their Item/Person Statistics. Educational and Psychological Measurement, 58, 357–381.
Haberman, S. J. (2008). When Can Subscores Have Value?.Journal of Educational and Behavioral Statistics, 33 (2), 204–229.
Haberman, S. J. and Sinharay, S. (2010a). Reporting of Subscore Using Multidimensional Item Response Theory, Psychometrika 75 (2), 209–227.
Haladyna, T. M. and Kramer, G. A. (2004). The Validity of Subscores for a Credentialing Test. Evaluation and the Health Professions, 27 (4), 349–368.
Harrison, D.A. (1986). Robustness of IRT Parameter Estimation to Violations of The Unidimensionality Assumption, Journal of Educational Statistics, 11, 91–115.
Hwang, D.Y. (2002). Classical Test Theory and Item Response Theory: Analitical and Empirical Comparison. Speeches/meeting paper, presented at the Annual Meeting of the Southwest Educational Research Association.
Jimelo L. and Silvestre-Tipay. (2009). Item Response Theory and Classical Test Theory: An Empirical Comparison of Item/Person Statistics in a Biological Science Test. The International Journal of Educational and Psychological Assessment, 1(1), 19-31.
Kelderman, H. (1996). Multidimensional Rasch Models for Partial-Credit Scoring. Applied Psychological Measurement, 20, 155–168.
Köse, A. (2010). Madde Tepki Kuramına Dayalı Tek Boyutlu ve Çok Boyutlu Modellerin Test Uzunluğu ve Örneklem Büyüklüğü Açısından Karşılaştırılması. Yayımlanmamış Doktora Tezi, Ankara Üniversitesi Eğitim Bilimleri Enstitüsü, Ankara.
Lawson, S. (1991). One Parameter Latent Trait Measurement: Do The Results Justify The Effort?. In B. Thompson (Ed.), Advances in Educational Research: Substantive Findings, Methodological Developments, Greenwich, CT: JAI Press, 1, 159–168.
Luecht R. M. (2003). Applications of Multidimensional Diagnostic Scoring for Certification and Licensure Tests. Paper presented at the meeting of the National Council on Measurement in Education, Chicago, IL.
MacDonald, P. and Paunonen, S. (2002). A Monte Carlo Comparison of Item and Person Statistics Based on İtem Response Theory Versus Classical Test Theory. Educational and Psychological Measurement, 62, 921–943.
MEB (2009). İlköğretim Öğrencilerinin Başarılarının Belirlenmesi Raporu-Türkçe, Matematik, Fen Bilgisi, Sosyal Bilgiler. Eğitim Araştırma ve Geliştirme Dairesi Başkanlığı.
Ndalichako, J. L. and Rogers,W. T. (1997). Comparison of Finite State Score Theory, Classical Test Theory, and Item Response Theory in Scoring Multiple-Choice Items. Educational and Psychological Measurement, 57, 580–589.
Progar, S. and Sočan ,G. (2008). An Empirical Comparison of Item Response Theory and Classical Test Theory, Horizons of Psychology, 17 (3), 5–24.
Rogers, W.T. and Ndalichako, J. (2000). Number-Right, Item-Response, and Finite-State Scoring: Robustness With Respect to Lack of Equally Classifiable Options and Item Option Dependence, Educational and Psychological Measurement, 60(1), 5–19.
Rost, J. and Carstensen, C. H. (2002). Multidimensional Rasch Measurement Via Item Component Models and Faceted Designs. Applied Psychological Measurement, 26, 42–56.
Sinharay, S., Haberman, S. J., and Puhan, G. (2007). Subscores Based on Classical Test Theory: to Report or Not to Report. Educational Measurement: Issues and Practice, 26 (4), 21–28.
Spencer, G.S. (2004). The Strength of Multidimensional Item Response Theory in Exporing Consrtuct Space That is Multidimensional and Corralated. Unpublished doctoral dissertation, Brigam Young University.
Stout, W. (1987). A nonparametric approach for assessing latent trait unidimensionality. Psychometrica, 52, 589–617.
Stout, W. F., Douglas, J., Junker, B. and Roussos, L.A. (1993). DIMTEST manual. Unpublished manuscript available from W. F. Stout, University of Illinois at Urbana-Champaign, Champaign.
Sünbül, Ö. (2011). Çeşitli Boyutluluk Özelliklerine Sahip Yapılarda, Madde Parametrelerinin Değişmezliğinin Klasik Test Teorisi, Tek Boyutlu Madde Tepki Kuramı ve Çok Boyutlu Madde Tepki Kuramı Çerçevesinde İncelenmesi. Yayımlanmamış doktora tezi, Mersin Üniversitesi Sosyal Bilimler Enstitüsü, Mersin.
Tatlıdil, H. (2002).Uygulamalı Çok Değişkenli İstatistiksel Analiz. Ankara: Akademi Matbaası.
Tate, R. L. (2004). Implications of Multidimensionality for Total Score and SubscorePerformance. Applied Measurement in Education, 17(2), 89–112.
Tomkowicz, J.ve Rogers, W.T. (2005). The Use of One-, Two-, and Three-Parameter and Nominal Item Response Scoring in Place of Number-Right Scoring in the Presence of Test-Wiseness, The Alberta Journal of Educational Research, 51(3),200–215.
Traub, R.E (1983). A Priori Consideration In Choosing An Item Response Model.In R.K.
Van der Linden, W. J. and Hambleton, R. K. (Eds.) (1997). Handbook of Modern Item Response Theory. New York: Springer.
Walker, C.M. ve Beretvas, S.N. (2003). Comparing Multidimensional and Unidimensional Proficiency Classifications: Multidimensional IRT As a Diagnostic Aid. Journal of Educational Measurement, 40 (3), 255-275.
Way, W. D., Ansley, T.N. and Forsyth, R. A. (1988). The Comparative Effects of Compensatory and Non-Compensatory Two Dimensional Data on Unidimensional IRT estimates. Applied Psychological Measurement, 12, 239–252.
Wiberg, M. (2012). Can a multidimensional test be evaluated with unidimensional item response theory? Educational Research and Evaluation, 18(4): 307-320
Yao, L. and Schwarz R. (2006). A Multidimensional Partial Credit Model with Associated İtem and Test Statistics: An Application to Mixed Format Tests. Applied Psychological Measurement, 30, 469–492.
Yao, L. (2009). Reporting Valid and Reliable Overall Score and Domain Score. Paper presented at the meeting of the National Council on Measurement in Education, San Diego, CA.
Downloads
Published
How to Cite
Issue
Section
License
Authors can retain copyright, while granting the journal right of first publication. Alternatively, authors can transfer copyright to the journal, which then permits authors non-commercial use of the work, including the right to place it in an open access archive. In addition, Creative Commons can be consulted for flexible copyright licenses.
©1999 Creative Commons Attribution-ShareAlike 4.0 International License.