Polysemy and word frequency

A replication


  • Koenraad Kuiper University of Canterbury
  • Robert Fromont University of Canterbury
  • Daniel Gerhard University of Canterbury




polysemy, lexical frequency, BNC, WordNet, Zipf


One piece of evidence adduced by George Kingsley Zipf for his eponymous law (Zipf, 1935) and its explanation of the principle of least effort (Zipf, 1949) is the hypothesis that a word's polysemy is proportional to the square root of its frequency (Levelt, 2013). Pawley (2006) following Zipf, also proposes that 'there is a strong general correlation between frequency and the extent of polysemy'. This paper replicates Zipf 's approach but with data drawn from different sources to those available to Zipf, namely, for word frequency, the Kilgarriff most frequent word list drawn from the BNC (Kilgarriff, 1995) and, as a measure of polysemy, the WordNet data for the polysemy of the words in Kilgarriff's list. It also takes note of the syntactic category of lexemes. More advanced statistical modelling is used. Zipf 's observations are confirmed with some provisos. Their utility is examined. Explanations for this relationship remain to be established.

Author Biography

  • Koenraad Kuiper, University of Canterbury

    Professor Emeritus Linguistics Department University of Canterbury


Amir, Y. and Sharon, I. (1990). Replication research: A ‘must’ for the scientific advancement of psychology. Journal of Social Behavior and Personality 5 (4): 51–69.

Baayen, R. H., Shaoul, C., Willits, J., and Ramscar, M. (2015). Comprehension with­out segmentation: A proof of concept with naive discrimination learning. Language, Cognition, and Neuroscience 31 (1): 106–128. https://doi.org/10.1080/23273798.2015.1065336

Baker, M. C. (2003). Lexical Categories: Verbs, Nouns and Adjectives. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511615047

Barque, L. and Chaumartin, F.-R. (2006). Regular polysemy in WordNet. LDV-Forum 21 (1): 1–14.

Chaplot, D. S., Bhattacharyya, P., and Paranjape, A. (2015). Unsupervised word sense disambiguation using Markov random field and dependency parser Paper presented at the 29th AAAI Conference on Artificial Intelligence (AAAI-15), Austin, Texas.

Crossley, S., Salsbury, T., and McNamara, D. (2010). The development of polysemy and frequency use in English second language speakers. Language Learning: A Journal of Research in Language Studies 60 (3): 573–605.

Everaert, M. and Bolhuis, J. (2017). The biology of language. Neuroscience and Biobehavioral Reviews 81: 99–102. https://doi.org/10.1016/j.neubiorev.2017.08.005

Grimshaw, J. (1990). Argument Structure. Cambridge, MA: MIT Press.

Hanks, P. (2013). Lexical Analysis: Norms and Exploitations. Cambridge, MA: MIT Press. https://doi.org/10.7551/mitpress/9780262018579.001.0001

Hernández-Fernández, A., Casas, B., Ferrer-i-Cancho, R., and Baixeries, J. (2016). Testing the robustness of laws of polysemy and brevity versus frequency. In P. Král and C. Martín-Vide (Eds) Statistical Language and Speech Processing. SLSP 2016. Lecture Notes in Computer Science, vol 9918. Champaign, IL: Springer. https://doi.org/10.1007/978-3-319-45925-7_2

Katz, J. J. and Fodor, J. A. (1963). The structure of semantic theory Language 39 (2): 170–210. https://doi.org/10.2307/411200

Kearns, K. (1998). Light verbs in English. Linguistics 34: 53–72. https://doi.org/10.1017/S002222679700683X

Kilgarriff, A. (1995). BNC database and word frequency lists. Retrieved on 24 February 2014 from http://www.kilgarriff.co.uk/BNC_lists/lemma.al

Klepousniotou, E. (2002). The processing of lexical ambiguity: Homonymy and polysemy in the mental lexicon. Brain and Language 81: 205–223. https://doi.org/10.1006/brln.2001.2518

Levelt, W. J. M. (1989). Speaking: From Intention to Articulation. Cambridge, MA: MIT Press.

Levelt, W. J. M. (2013). A History of Psycholinguistics: The pre-Chomskian Era. Oxford: Oxford University Press.

Levelt, W. J. M., Roelofs, A., and Meyer, A. S. (1999). A theory of lexical access in speech production. Behavioral & Brain Sciences 22 (1): 1–75. https://doi.org/10.1017/S0140525X99001776

McCullagh, P. and Nelder, J. A. (1989). Generalized linear Models (2nd ed.). Boca Raton, FL: Chapman & Hall. https://doi.org/10.1007/978-1-4899-3242-6

Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., and Miller, K. (1992). WordNet: A lexical database for English. Commun. ACM 38: 39-41. https://doi.org/10.1145/219717.219748

Nation, I. S. P. (2008). Teaching Vocabulary: Strategies and Techniques. Boston, MA: Cengage Learning.

Pawley, A. (2006). Where have all the verbs gone? Remarks on the organisation of language with small, closed verb classes. Paper presented at the 11th Biennial Rice University Linguistics Symposium. Austin, Texas.

R Core Team. (2016). R: A language and environment for statistical computing. Retrieved on 28 August 2015from https://www.R-project.org/

Simons, D. J. (2014). The value of direct replication. Perspectives on Psychological Science 9 (1): 76–80. https://doi.org/10.1177/1745691613514755

Taylor, J. R. (2003). Polysemy’s paradoxes. Language Sciences 25 (6): 637–655. https://doi.org/10.1016/S0388-0001(03)00031-7

Taylor, J. R. (2012). The Mental Corpus: How Language is Represented in the Mind. Oxford: Oxford University Press. https://doi.org/10.1093/acprof:oso/9780199290802.001.0001

Tengi, R. I. (1998). Design and implementation of the WordNet lexical database and searching software. In: C. Fellbaum (Ed.) WordNet: An Electronic Lexical Database, 105–127. Cambridge, MA: MIT Press.

Wittgenstein, L. (1965). Philosophical Investigations. New York: The Macmillan Company.

Yang, C. (2013). Ontogeny and phylogeny of language. Proceedings of the National Academy of Sciences 110 (16): 6324–6327. https://doi.org/10.1073/pnas.1216803110

Yang, C. (2016). The Price of Linguistic Productivity: How Children learn to break the Rules of Language. Cambridge, MA: MIT Press.

Zipf, G. K. (1949). Human Behaviour and the Principle of Least Effort. Cambridge, MA: Addison-Wesley.






How to Cite

Kuiper, K., Fromont, R., & Gerhard, D. (2018). Polysemy and word frequency: A replication. Journal of Research Design and Statistics in Linguistics and Communication Science, 4(2), 144-155. https://doi.org/10.1558/jrds.33751