Adjusting Regression Models for Overfitting in Second Language Research


  • Phillip Hamrick



regression modeling, validation, bootstrap


Regression modeling is an increasingly important quantitative tool for second language (L2) research. While superior in many ways to more traditional methods, such as ANOVA, regression modeling, like all procedures, still has limitations, ranging from small sample sizes to a lack of screening for outliers and influential data points (Plonsky and Ghanbar, 2018). Since these limitations are common features in L2 research, this raises concerns that existing studies using regression may overfit the data, perhaps inflating effect size estimates. These issues can be partially alleviated via robust statistics, such as validation. This paper provides L2 researchers with an overview of these issues and an instructive look at one robust validation method: bootstrapping.


Abrahamsson, N. and Hyltenstam, K. (2009). Age of onset and nativelikeness in a second language: Listener perception versus linguistic scrutiny. Language Learning 59 (2), 249-306.

DeKeyser, R. M. (2000). The robustness of critical period effects in second language acquisition. Studies in Second Language Acquisition 22 (4), 499-533.

Egbert, J. and Plonsky, L. (in press). Bootstrapping techniques. In S. T. Gries and M. Paquot (Eds), A Practical Handbook of Corpus Lingusitics. New York: Springer.

James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013). An Introduction to Statistical Learning: with Applications in R. New York: Springer-Verlag. Retrieved from //

Laflair, G. T., Egbert, J., and Plonsky, L. (2015). A practical guide to bootstrapping descriptive statistics, correlations, t tests, and ANOVAs. In L. Plonsky (Ed.), Advancing Quantitative Methods in Second Language Research, 46-77. New York: Routledge.

Larson-Hall, J. and Herrington, R. (2010). Improving data analysis in second language acquisition by utilizing modern developments in applied statistics. Applied Linguistics 31 (3), 368-390.

Nikitina, L. and Furuoka, F. (2018). Expanding the methodological arsenal of applied linguistics with a Robust Statistical Procedure. Applied Linguistics.

Plonsky, L. (2013). Study quality in SLA: An assessment of designs, analyses, and reporting practices in Quantitative L2 Research. Studies in Second Language Acquisition 35 (4), 655-687.

Plonsky, L. (2014). Study quality in quantitative L2 research (1990-2010): A methodological synthesis and call for reform. The Modern Language Journal 98 (1), 450-470.

Plonsky, L. (2015). Statistical power, p values, descriptive statistics, and effect sizes: A 'back-to-basics' approach to advancing quantitative methods in L2 research. In L. Plonsky, Advancing Quantitative Methods in Second Language Research, 23-45). New York: Routledge.

Plonsky, L., Egbert, J., and Laflair, G. T. (2015). Bootstrapping in applied linguistics: Assessing its potential using shared data. Applied Linguistics 36 (5), 591-610.

Plonsky, L. and Ghanbar, H. (2018). Multiple regression in L2 research: A methodological synthesis and guide to interpreting R2 values. The Modern Language Journal 102 (4), 713-731.

Plonsky, L. and Oswald, F. L. (2017). Multiple regression as a flexible alternative to ANOVA in L2 research. Studies in Second Language Acquisition 39 (3), 579-592.

R Core Team. (2018). R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing.



How to Cite

Hamrick, P. (2019). Adjusting Regression Models for Overfitting in Second Language Research. Journal of Research Design and Statistics in Linguistics and Communication Science, 5(1-2), 107–122.