Adjusting Regression Models for Overfitting in Second Language Research

Authors

  • Phillip Hamrick

DOI:

https://doi.org/10.1558/jrds.38374

Keywords:

regression modeling, validation, bootstrap

Abstract

Regression modeling is an increasingly important quantitative tool for second language (L2) research. While superior in many ways to more traditional methods, such as ANOVA, regression modeling, like all procedures, still has limitations, ranging from small sample sizes to a lack of screening for outliers and influential data points (Plonsky and Ghanbar, 2018). Since these limitations are common features in L2 research, this raises concerns that existing studies using regression may overfit the data, perhaps inflating effect size estimates. These issues can be partially alleviated via robust statistics, such as validation. This paper provides L2 researchers with an overview of these issues and an instructive look at one robust validation method: bootstrapping.

References

Abrahamsson, N. and Hyltenstam, K. (2009). Age of onset and nativelikeness in a second language: Listener perception versus linguistic scrutiny. Language Learning 59 (2), 249-306. https://doi.org/10.1111/j.1467-9922.2009.00507.x
https://doi.org/10.1111/j.1467-9922.2009.00507.x

DeKeyser, R. M. (2000). The robustness of critical period effects in second language acquisition. Studies in Second Language Acquisition 22 (4), 499-533.

Egbert, J. and Plonsky, L. (in press). Bootstrapping techniques. In S. T. Gries and M. Paquot (Eds), A Practical Handbook of Corpus Lingusitics. New York: Springer.

James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013). An Introduction to Statistical Learning: with Applications in R. New York: Springer-Verlag. Retrieved from //www.springer.com/us/book/9781461471370
https://doi.org/10.1007/978-1-4614-7138-7

Laflair, G. T., Egbert, J., and Plonsky, L. (2015). A practical guide to bootstrapping descriptive statistics, correlations, t tests, and ANOVAs. In L. Plonsky (Ed.), Advancing Quantitative Methods in Second Language Research, 46-77. New York: Routledge. https://doi.org/10.4324/9781315870908-4

Larson-Hall, J. and Herrington, R. (2010). Improving data analysis in second language acquisition by utilizing modern developments in applied statistics. Applied Linguistics 31 (3), 368-390. https://doi.org/10.1093/applin/amp038 https://doi.org/10.1093/applin/amp038

Nikitina, L. and Furuoka, F. (2018). Expanding the methodological arsenal of applied linguistics with a Robust Statistical Procedure. Applied Linguistics. https://doi.org/10.1093/applin/amx026
https://doi.org/10.1093/applin/amx026

Plonsky, L. (2013). Study quality in SLA: An assessment of designs, analyses, and reporting practices in Quantitative L2 Research. Studies in Second Language Acquisition 35 (4), 655-687. https://doi.org/10.1017/S0272263113000399
https://doi.org/10.1017/S0272263113000399

Plonsky, L. (2014). Study quality in quantitative L2 research (1990-2010): A methodological synthesis and call for reform. The Modern Language Journal 98 (1), 450-470. https://doi.org/10.1111/j.1540-4781.2014.12058.x
https://doi.org/10.1111/j.1540-4781.2014.12058.x

Plonsky, L. (2015). Statistical power, p values, descriptive statistics, and effect sizes: A 'back-to-basics' approach to advancing quantitative methods in L2 research. In L. Plonsky, Advancing Quantitative Methods in Second Language Research, 23-45). New York: Routledge.
https://doi.org/10.4324/9781315870908-3

Plonsky, L., Egbert, J., and Laflair, G. T. (2015). Bootstrapping in applied linguistics: Assessing its potential using shared data. Applied Linguistics 36 (5), 591-610. https://doi.org/10.1093/applin/amu001
https://doi.org/10.1093/applin/amu001

Plonsky, L. and Ghanbar, H. (2018). Multiple regression in L2 research: A methodological synthesis and guide to interpreting R2 values. The Modern Language Journal 102 (4), 713-731. https://doi.org/10.1111/modl.12509
https://doi.org/10.1111/modl.12509

Plonsky, L. and Oswald, F. L. (2017). Multiple regression as a flexible alternative to ANOVA in L2 research. Studies in Second Language Acquisition 39 (3), 579-592. https://doi.org/10.1017/S0272263116000231
https://doi.org/10.1017/S0272263116000231

R Core Team. (2018). R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing.

Published

2019-12-19

How to Cite

Hamrick, P. (2019). Adjusting Regression Models for Overfitting in Second Language Research. Journal of Research Design and Statistics in Linguistics and Communication Science, 5(1-2), 107–122. https://doi.org/10.1558/jrds.38374

Issue

Section

Articles