The effect of item pool size and stopping rule on bias and error parameters in computerized adaptive testing
DOI:
https://doi.org/10.14687/jhs.v21i4.6513Keywords:
computerized adaptive testing, item pool size, stopping rule, item parameters, error parametersAbstract
This study aims to examine the effects of item pool size and stopping rules on bias and error parameters in computerized adaptive testing (CAT). Simulated data were generated and analyzed for item pools of 200 and 2000 items, fixed-length and error stopping rules, and eight different combinations of Maximum Likelihood (ML) and Expected a Posteriori (EAP) methods for ability estimation. For each sub-problem, the findings are listed in a general table and a collection of graphs. In the light of the findings, the error and bias parameters obtained with the ML skill estimation method with fixed-length stopping rule in both small and large item pools were found to be lower than the EAP method. Similarly, in both small and large item pools, the error and bias parameters obtained with the ML ability estimation method in which the error stopping rule was applied were found to be lower than the EAP method. While the number of items used in the analyses where the error stopping rule was applied was approximately six in both ability estimation methods for the small item pool, approximately five items were used with the EAP method and approximately 21 items were used with the ML method in the large item pool. The application with the lowest RMSE and bias values and the highest correlation between actual and predicted θ values was the application related to the sixth sub-problem in which a large item pool was used, a fixed-length stopping rule was used, and ability estimation was performed with the ML method. The results of the study provide clues for the development of testing strategies in CAT applications and provide guidance for more effective assessment processes in the field of education.
Downloads
Metrics
References
Babcock, B., & Weiss, D.J., (2012). Termination criteria in computerized adaptive tests: Do variable-length CAT’s provide efficient and effective measurement? International Association for Computerized Adaptive Testing, 1, 1-18. https://doi.org/10.7333/1212-0101001
Baker, F. B., & Kim, S. H. (2004). Bayesian Parameter Estimation Procedures. In: F. B. Baker & S. H. Kim (Eds.), Item response theory parameter estimation techniques, (p. 171-201). Marcel Dekker.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. F. M. Lord & M. R. Novick (Ed), Statistical theories of mental test scores (pp. 397-472). Reading MA: Addison-Wesley.
Blais, J. & Raiche, G. (2002). Features of the sampling distribution of the ability estimate in Computerized Adaptive Testing according to two stopping rules, International Objective Measurement Workshop, New Orleans, April 2002.
Bock, R. D., & Aitkin, M. (1981) Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443-459. https://doi.org/10.1007/BF02293801
Chang, H. H. (2015). Psychometrics behind computerized adaptive testing. Psychometrika, 80(1), 1-20. https://doi.org/10.1007/s11336-014-9401-5
Choi, S. W., Grady, M. W., & Dodd, B. G. (2011). A new stopping rule for computerized adaptive testing. Educational and Psychological Measurement, 71(1), 37-53. https://doi.org/10.1177/0013164410387338
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Erlbaum.
Erdem Kara, B. (2019). Computer Adaptive Testing Simulations in R. Journal of Measurement and Evaluation in Education and Psychology, 6(5), 44-56. https://dx.doi.org/10.21449/ijate.621157
Eroğlu, M. G. (2013). Comparison of different test termination rules in terms of measurement precision and test length in computerized adaptive testing. Unpublished Doctoral Dissertation. Hacettepe University.
Frey, A. & Nicki-Nils, S. (2009). Multidimensional adaptive testing in educational and psychological measurement: Current state and future challenges. Studies in Educational Evaluation, 35, 89–94
Gershon, R. (2005). Computerized adaptive testing. Journal of Applied Measurements, 6(1), 109-127.
Gushta MM (2003). Standard-setting Issues in Computerized-Adaptive Testing. Centre for Research in Applied Measurement and Evaluation Paper Prepared for Presentation at the Annual Conference of the Canadian Society for Studies in Education, Halifax, Nova Scotia, www2.education.ualberta.ca/educ/psych/ crame/files/GushtaCSSE2003.pdf>. February 30th, 2011.
Hambleton, R. K., Swaminathan, H. & Rogers, H. J. (1991). Fundamentals of item response theory. Sage.
Karasar, N. (2004). Scientific research method: concepts, principles, techniques [Bilimsel araştırma yöntemi: kavramlar, ilkeler, teknikler] (13. Ed.). Ankara: Nobel Publishing.
Kalender, İ. (2011). Effects of different Computerized Adaptive Testing strategied on recovery of abilitiy. Unpublished Doctoral Disertation. Middle East Technical University.
Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices (2nd ed.). New York: Springer-Verlag.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Erlbaum
Magis, D. & Barrada, J. R. (2017). Computerized adaptive testing with R: recent updates of the package catR. Journal of Statistical Software, 76(1), 1-19. doi:10.18637/jss.v076.c01
Magis, D., Yan, D. & von Davier, A. (2017). Computerized adaptive and multistage testing with R: Using packages catR and mstR. New York: Springer.
McBride, J. R. & Martin, J. T. (1983). Reliability and validity of adaptive ability tests in a militarysetting. In D. J. Weiss (Eds.), New horizons in testing: Latent trait theory and computerized adaptive testing, 223–226). New York:Academic Press.
Meijer, R. R., & Nering, M. L. (1999). Computerized adaptive testing: Overview and introduction. Applied psychological measurement, 23(3), 187-194. https://doi.org/10.1177/01466219922031310
Mills C. N., & Stocking, M. L. (1996) Practical issues in Large-Scale Computerized Adaptive Testing, Applied Measurement in Education, 9(4), 287-304, https://doi.org/10.1207/s15324818ame0904_1
R Core Team (2013). R: A language and environment for statistical computing, (Version 3.0.1) [Computer software], Vienna, Austria: R Foundation for Statistical Computing. Retrieved from http://www.Rproject.org/
Reckase, M.D. (2009). Multidimensional item response theory. New York: Springer.
Rezaei, M., & Golshan, M. (2015). Computer Adaptive Test (CAT). Advantages and Limitations. International Journal of Educational Investigations, 2(5), 128-137.
Segall, D. O. (2004). A sharing item response theory model for computerized adaptive testing. Journal of Educational and Behavioral Statistics, 29(4), 439-460. https://doi.org/10.3102/10769986029004439
Stocking, M. L. (1994). Three practical issues for modern adaptive testing item pools (ETS Research Rep. No. 94-5). Princeton: Educational Testing Service.
Swaminathan, H., Hambleton, R. K., Sireci, S. G., Xing, D., & Rizavi, S. M. (2003). Small Sample Estimation in Dichotomous Item Response Models: Effect of Priors Based on Judgmental Information on the Accuracy of Item Parameter Estimates. Applied Psychological Measurement ,27(1), 27-51. https://doi.org/10.1177/0146621602239475
Thissen, D. & Mislevy, R.J. (2000). Testing algorithms. In H. Wainer, N. Dorans, D. Eignor, R. Flaugher, B. Green, R. Mislevy, L. Steinberg & D. Thissen (Eds.), Computerized adaptive testing: A primer (2nd ed.) 101-133. Lawrence Erlbaum Associates.
Thompson, N. A., & Weiss, D. J. (2011). A framework for the development of computerized adaptive tests. Practical Assessment, Research & Evaluation, 16(1). 1-9. https://doi.org/10.7275/wqzt-9427
Wainer, H. (1993). Some practical considerations when converting a linearly administered test to an adaptive format. Educational Measurement: Issues and Practice, 12(1), 15–20. https://doi.org/10.1111/j.1745-3992.1993.tb00519.x
Wang, C. (2013). Mutual Information Item Selection Method in Cognitive Diagnostic Computerized Adaptive Testing With Short Test Length. Educational and Psychological Measurement, 73(6), 1017-1035. https://doi.org/10.1177/0013164413498256
Wang, S. & Wang, T. (2001). Precision of warm’s weighted likelihood estimates for a polytomous model in Computerized Adaptive Testing. Applied Psychological Measurement, 25(4), 317–331.
Wang, T., Hanson, B. A. & Lau, C. (1999). Reducing bias in CAT ability estimation: a comparison of approaches. Applied Psychological Measurement, 23, 263-278.
Wang, T., & Vispoel, W. P. (1998). Properties of ability estimation methods in computerized adaptive testing. Journal of Educational Measurement, 35(2), 109-135. https://doi.org/10.1111/j.1745-3984.1998.tb00530.x
Way, W. D. (1998). Protecting the integrity of computerized testing item pools. Educational Measurement: Issues and Practice, 17(4), 17-27. https://doi.org/10.1111/j.1745-3992.1998.tb00632.x
Weiss, D. J. (2004). Computerized adaptive testing for effective and efficient measurement in counseling and education. Measurement and Evaluation in Counseling and Development, 37(2), 70-84. https://doi.org/10.1080/07481756.2004.11909751
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Journal of Human Sciences

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Authors can retain copyright, while granting the journal right of first publication. Alternatively, authors can transfer copyright to the journal, which then permits authors non-commercial use of the work, including the right to place it in an open access archive. In addition, Creative Commons can be consulted for flexible copyright licenses.
©1999 Creative Commons Attribution-ShareAlike 4.0 International License.