Investigating Lexical Effects in Syntax with Regularized Regression (Lasso)




cross-validation, regularization, lasso, machine learning, corpus linguistics, collostructional analysis, distinctive collexeme analysis, overfitting


Within usage-based theory, notably in construction grammar though also elsewhere, the role of the lexicon and of lexically-specific patterns in morphosyntax is well recognized. The methodology, however, is not always sufficiently suited to get at the details, as lexical effects are difficult to study under what are currently the standard methods for investigating grammar empirically. In this short article, we propose a method from machine learning: regularized regression (Lasso) with k-fold cross-validation, and compare its performance with a Distinctive Collexeme Analysis.

Author Biographies

Freek Van de Velde, KU Leuven

Freek Van de Velde (KU Leuven) is associate professor of Dutch linguistics and historical linguistics. His research focuses on quantitative approaches to variation and change and evolutionary linguistics. He received his PhD in 2009, with a work on the diachrony of the noun phrase.

Dirk Pijpops, Université de Liège

Dirk Pijpops (University of Liège) works as lecturer of Dutch. He is affiliated with the research unit Lilith. His research focuses language variation and change, which he studies in order to answer questions in usage-based theoretical linguistics. Methodologically, his work builds on quantitative corpus analyses and agent-based computer simulations. He received his PhD in 2019 at the University of Leuven, with a thesis focused on argument structure variation in Dutch.


Van de Velde, F. ., & Pijpops, D. . (2021). Investigating Lexical Effects in Syntax with Regularized Regression (Lasso). Journal of Research Design and Statistics in Linguistics and Communication Science, 6(2), 166–199.