Large-sample confidence intervals of information-theoretic measures in linguistics


  • Ryan Ka Yau Lai University of California Santa Barbara
  • Youngah Do The University of Hong Kong



information-theoretic measures, entropy, Kullback-Leibler Divergence, mutual information


This article explores a method of creating confidence bounds for information-theoretic measures in linguistics, such as entropy, Kullback-Leibler Divergence (KLD), and mutual information. We show that a useful measure of uncertainty can be derived from simple statistical principles, namely the asymptotic distribution of the maximum likelihood estimator (MLE) and the delta method. Three case studies from phonology and corpus linguistics are used to demonstrate how to apply it and examine its robustness against common violations of its assumptions in linguistics, such as insufficient sample size and non-independence of data points.

Author Biographies

Ryan Ka Yau Lai, University of California Santa Barbara

Ryan Ka Yau Lai is a PhD student in the Department of Linguistics, University of California, Santa Barbara, CA, USA.

Youngah Do, The University of Hong Kong

Youngah Do is Assistant Professor in the Department of Linguistics of The University of Hong Kong.


How to Cite

Lai, R. K. Y., & Do, Y. (2020). Large-sample confidence intervals of information-theoretic measures in linguistics. Journal of Research Design and Statistics in Linguistics and Communication Science, 6(1), 19–54.