Adjusting Regression Models for Overfitting in Second Language Research
Keywords:regression modeling, validation, bootstrap
Regression modeling is an increasingly important quantitative tool for second language (L2) research. While superior in many ways to more traditional methods, such as ANOVA, regression modeling, like all procedures, still has limitations, ranging from small sample sizes to a lack of screening for outliers and influential data points (Plonsky and Ghanbar, 2018). Since these limitations are common features in L2 research, this raises concerns that existing studies using regression may overfit the data, perhaps inflating effect size estimates. These issues can be partially alleviated via robust statistics, such as validation. This paper provides L2 researchers with an overview of these issues and an instructive look at one robust validation method: bootstrapping.
DeKeyser, R. M. (2000). The robustness of critical period effects in second language acquisition. Studies in Second Language Acquisition 22 (4), 499-533.
Egbert, J. and Plonsky, L. (in press). Bootstrapping techniques. In S. T. Gries and M. Paquot (Eds), A Practical Handbook of Corpus Lingusitics. New York: Springer.
James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013). An Introduction to Statistical Learning: with Applications in R. New York: Springer-Verlag. Retrieved from //www.springer.com/us/book/9781461471370
Laflair, G. T., Egbert, J., and Plonsky, L. (2015). A practical guide to bootstrapping descriptive statistics, correlations, t tests, and ANOVAs. In L. Plonsky (Ed.), Advancing Quantitative Methods in Second Language Research, 46-77. New York: Routledge. https://doi.org/10.4324/9781315870908-4
Larson-Hall, J. and Herrington, R. (2010). Improving data analysis in second language acquisition by utilizing modern developments in applied statistics. Applied Linguistics 31 (3), 368-390. https://doi.org/10.1093/applin/amp038 https://doi.org/10.1093/applin/amp038
Nikitina, L. and Furuoka, F. (2018). Expanding the methodological arsenal of applied linguistics with a Robust Statistical Procedure. Applied Linguistics. https://doi.org/10.1093/applin/amx026
Plonsky, L. (2013). Study quality in SLA: An assessment of designs, analyses, and reporting practices in Quantitative L2 Research. Studies in Second Language Acquisition 35 (4), 655-687. https://doi.org/10.1017/S0272263113000399
Plonsky, L. (2014). Study quality in quantitative L2 research (1990-2010): A methodological synthesis and call for reform. The Modern Language Journal 98 (1), 450-470. https://doi.org/10.1111/j.1540-4781.2014.12058.x
Plonsky, L. (2015). Statistical power, p values, descriptive statistics, and effect sizes: A 'back-to-basics' approach to advancing quantitative methods in L2 research. In L. Plonsky, Advancing Quantitative Methods in Second Language Research, 23-45). New York: Routledge.
Plonsky, L., Egbert, J., and Laflair, G. T. (2015). Bootstrapping in applied linguistics: Assessing its potential using shared data. Applied Linguistics 36 (5), 591-610. https://doi.org/10.1093/applin/amu001
Plonsky, L. and Ghanbar, H. (2018). Multiple regression in L2 research: A methodological synthesis and guide to interpreting R2 values. The Modern Language Journal 102 (4), 713-731. https://doi.org/10.1111/modl.12509
Plonsky, L. and Oswald, F. L. (2017). Multiple regression as a flexible alternative to ANOVA in L2 research. Studies in Second Language Acquisition 39 (3), 579-592. https://doi.org/10.1017/S0272263116000231
R Core Team. (2018). R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing.
How to Cite
© Equinox Publishing Ltd.
For information regarding our Open Access policy, click here.