Searching for extended units of meaning — and what to do when you find them


  • Michael Rundell Lexical Computing Ltd.



Extended units of meaning, Collocation, Colligation, Semantic prosody, Multi-word sketch, Longest commonest match


Two of the key outcomes of corpus-linguistic research over the past 30 years have been the development of the idea that meanings are mostly constructed through context (undermining traditional notions of the individual word as an autonomous bearer of meaning); and the discovery that recurrence and regularity—our tendency to employ a limited number of conventionalized ways of expressing ideas—are essential features of the language system. Both fndings have had a major impact on our understanding of how language works, and both have infuenced the content of dictionary entries—contributing, for example, to improved word sense disambiguation, and to a greater emphasis on phraseology and collocation. However, there is still much to do. Ever-larger corpora and more powerful corpus-query tools reveal areas where we can further improve our description of languages, and thus provide better resources for users. In addition, the migration of dictionaries to digital media (removing space constraints) opens up new opportunities for doing this. In a characteristically far sighted paper (Sinclair, Textus 9(1): 75–106, 1996), John Sinclair broadened the search for what he called “units of meaning” by investigating longer strings of words and identifying recurrent, and often quite extended, patterns of usage. Using this as a starting point, I will look at other examples in corpus data of the kinds of patterning Sinclair discussed, and we will see how current corpus-querying systems can help us identify these extended units of meaning. Finally, I will speculate about whether dictionaries should aim to describe these longer units, and if so, how this might work in practice.


Biber, D., S. Johansson, G. Leech, S. Conrad, and E. Finegan. 1999. Longman Grammar of Spoken and Written English. London: Pearson Education.

Convery, C.Ó., P. Mianáin, M.Ó. Raghallaigh, S. Atkins, A. Kilgarrif, and M. Rundell. 2010. The DANTE Database (Database of ANalysed Texts of English). In Proceedings of the XIV EURALEX Congress, ed. Anne Dykstra, and Tanneke Schoonheim. Leeuwarden: Fryske Akademy.

Cowie, A.P. 1999. English Dictionaries for Foreign Learners: A History. Oxford: Oxford University Press.

Hanks, P.W. 2013. Lexical Analysis: Norms and Exploitations. Cambridge: MIT Press.

Halliday, M.A.K. 1966. Lexis as a Linguistic Level. In Memory of J. R. Firth, eds. C.E Bazell, J.C Catford, M.A.K. Halliday, R.H. Robins. 148–162. London: Longman.

Hoey, M. 2005. Lexical Priming: A New Theory of Words and Language. London: Routledge.

Johnson, S. 1755. Preface to A Dictionary of the English Language. Edited by Jack Lynch.

Kilgarrif, A., P. Rychly, P. Smrz, and D. Tugwell. 2004. The Sketch Engine. In Proceedings of the Eleventh Euralex Congress, ed. Geofrey Williams and Sandra Vessier, 105–116. France: UBS Lorient.

Kilgarrif, A., Baisa, V., Rychlý, P., Jakubí?ek, M. 2015. Longest–commonest Match. In Electronic Lexicography in the 21st Century: Linking Lexical Data in the Digital Age. Proceedings of the eLex 2015 conference, ed. Kosem, I., Jakubí?ek, M., Kallas, J., Krek, S, 397–404. Ljubljana/Brighton

Rundell, M. 2015. From Print to Digital: implications for Dictionary Policy and Lexicographic Conventions. Lexikos 25: 301–322.

Rundell, M., and A. Kilgarrif. 2011. Automating the Creation of Dictionaries: Where Will It All End? In A Taste for Corpora. A tribute to Professor Sylviane Granger, ed. F. Meunier, S. De Cock, G. Gilquin, and M. Paquot, 257–281. Amsterdam: Benjamins.

Sinclair, J.M. 1991. Corpus, Concordance, Collocation. Oxford: Oxford University Press.

Sinclair, J.M. 1996. The Search for Units of Meaning. Textus 9 (1): 75–106.

Sinclair, J.M. 1998. The Lexical Item. In Contrastive Lexical Semantics, ed. E. Weigand, 1–24. Amsterdam: Benjamins.

Sinclair, J.M. 2007/2010. Defning the Defniendum. In A Way with Words: Recent Advances in Lexical Theory and Analysis - A Festschrift for Patrick Hanks, ed. G-M. de Schryver, 37–47. Kampala: Menha Publishers.

Summers, D. (ed.). 1993. Longman Language Activator. London: Longman.



How to Cite

Rundell, M. (2018). Searching for extended units of meaning — and what to do when you find them. Lexicography, 5(1), 5–21.