Acoustic correlates of female speech under stress based on /i/-vowel measurements


  • Lauri Tavi University of Eastern Finland



Speech under stress, Acoustic correlates, Emergency calls


The purpose of this article is to measure and evaluate commonly identified, yet rather inconsistent, acoustic correlates of speech under stress from authentic emergency call recordings. In this study, ten different acoustic parameters are measured from manually segmented /i/-vowels and hypotheses based on previous studies are statistically tested for a set of female emergency call recordings. The statistical analyses confirm that in comparison to the neutral speech group, the speech under stress group differs in fundamental frequency, shimmer, harmonicity, Hammarberg index, F1, F2, F3 and formant dispersion, which mostly supports the findings from previous studies. Conversely, jitter and vowel duration do not show any statistical difference between the speech under stress group and the neutral group. Furthermore, the results substantiate that stress recognition using different acoustic parameters is feasible from data sets as small as vowel segments; however, the effect of inter-speaker variation must not be underestimated. In future research, a stress detection model for telephone bandpass limited speech based on the optimal combination of acoustic parameters will be created.

Author Biography

Lauri Tavi, University of Eastern Finland

School of Humanities, Grant Researcher


Boersma, P. and Weenink, D. (2016) Praat: doing phonetics by computer [Computer program]. Version 6.0.21. Retrieved 25 September 2016 from

Brockmann, M., Drinnan, M. J., Storck, C. and Carding, P. N. (2011) Reliable jitter and shimmer measurements in voice clinics: the relevance of vowel, gender, vocal intensity, and fundamental frequency effects in a typical clinical task. Journal of Voice 25(1): 44–53.

Cummings, N., Scherer, S., Krajewski, J., Schnieder, S., Epps, J. and Quatieri, T. F. (2015) A review of depression and suicide risk assessment using speech analysis. Speech Communication 71: 10–49.

Dellwo, V., Leemann, A. and Kolly, M.-J. (2015) Rhythmic variability between speakers: articulatory, prosodic and linguistic factors. Journal of the Acoustical Society of America, 137(3): 1513–1528.

Demenko, G. (2008) Voice stress extraction. Proceedings of Speech Prosody Conference. 6–9 May, Campinas, Brazil: 53–56.

Demenko, G. and Jastrz?bska, M. (2012) Analysis of voice stress in call centers conversations. Proceedings of Speech Prosody. Shanghai, China: 183–186.

Farrus, M. (2008) Fusing prosodic and acoustic information for speaker recognition. PhD Thesis, Polytechnic University of Catalonia.

Gafni, C. (2017) Hammarberg index – Praat plugin. Retrieved 5 June 2017 from

Ga?ka, J., Grzybowska, J., Igras, M., Jaciow, P., Wajda, K., Witkowski, M. and Zio?ko, M. (2015) System supporting speaker identification in emergency call centers. In Proceedings of Interspeech.

Grawunder, S. and Winter, B. (2010) Acoustic correlates of politeness: prosodic and voice quality measures in polite and informal speech of Korean and German speakers. In International Conference for Speech Prosody 5, Chicago.

He, L., Lech, M., Maddage, N. C. and Allen, N. B. (2011) Study of empirical mode decomposition and spectral analysis for stress and emotion classification in natural speech. Biomedical Signal Processing and Control 6(2): 139–146.

Hill, D. R. (2007) Speaker classification concepts: past, present and future. In C. Muller (ed.) Speaker Classification I: Fundamentals, Features, and Methods 2 1–46. Berlin: Springer-Verlag.

Hollien, H. (1990) Acoustics of Crime: The New Science of Forensic Phonetics. New York: Plenum.

Jessen, M. (2008) Forensic phonetics. Language and Linguistics Compass 2(4): 671–711.

Karlsson, I., Banziger, T., Dankovicova, J., Johnstone, T., Lindberg, J., Melin, H.,
Nolan, F. and Scherer, K. (2000) Speaker verification with elicited speaking styles in the VeriVox project. Speech Communication 31(2–3): 121–129.

Kinnunen, T. and Li, H. (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Communication 52(1): 12–40.

Kirchhubel, C. and Howard, D. M. (2013) Detecting suspicious behaviour using speech: acoustic correlates of deceptive speech – an exploratory investigation. Applied Ergonomics 44(5): 694–702.

Kirchhubel, C., Howard, D. M. and Stedmon A. W. (2011) Acoustic correlates of speech when under stress: research, methods and future directions. International Journal of Speech Language and the Law 18(1): 75–98.

Lefter, I., Rothkrantz, L. J., Van Leeuwen, D. A. and Wiggers, P. (2011) Automatic stress detection in emergency (telephone) calls. International Journal of Intelligent Defence Support Systems 4(2): 148–168.

Lennes, M. (2003) Collect formant data from files – Praat script. Retrieved 1 September 2016 from

McCloy, D. R. (2012) Normalizing and plotting vowels with the phonR package. Technical Reports of the UW Linguistic Phonetics Laboratory: 1–8. Retrieved on 8 June from

Niemi-Laitinen, T. (1999) Puhujantunnistus rikostutkinnassa. Licentiate thesis, Helsingin yliopisto, Helsinki.

Patil, A. S. and Hansen J. H. L. (2007) Speech under stress: analysis, modelling and recognition. In C. Muller (ed.) Speaker Classification I: Fundamentals, Features, and Methods 108–137. Berlin: Springer-Verlag.

R Core Team (2014) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL

Rasanen, O., Doyle, G. and Frank, M. C. (2015) Unsupervised word discovery from speech using automatic segmentation into syllable-like units. In Proceedings of Interspeech.

Scherer, K. R., Grandjean, D., Johnstone, T., Klasmeyer, G. and Banziger, T. (2002) Acoustic correlates of task load and stress. In Proceedings of Interspeech.

Schroder, M. (2001) Emotional speech synthesis: a review. In Proceedings of Interspeech.

Sigmund, M. (2006) Introducing the database ExamStress for speech under stress. In Proceedings of the 7th Nordic Signal Processing Symposium-NORSIG 2006: 290–293.

Sondhi, S., Khan, M., Vijay, R. and Salhan, A. K. (2015) Vocal indicators of emotional stress. International Journal of Computer Applications 122(15): 38–45.

Steeneken, H. J. and Hansen, J. H. (1999) Speech under stress conditions: overview of the effect on speech production and on system performance. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing 15–19 March: 2079–2082.

Teixeira, J. P. and Fernandes, P. O. (2004) Jitter, shimmer and HNR classification within gender, tones and vowels in healthy voices. Procedia Technology 16: 1228–1237.

Van Lierde, K., van Heule, S., De Ley, S., Mertens, E. and Claeys, S. (2009) Effect of psychological stress on female vocal quality. Folia Phoniatrica et Logopaedica 61(2):105–111.

Womack, B. D. and Hansen, J. H. L. (1996) Classification of speech under stress using target driven features. Speech Communication 20(1–2): 131–150.

Xu, Y. (2013) ProsodyPro – a tool for large-scale systematic prosody analysis. In Proceedings of Tools and Resources for the Analysis of Speech Prosody (TRASP 2013), Aix-en-Provence, France: 7–10.

Zhou, G., Hansen, J. H. and Kaiser, J. F. (2001) Nonlinear feature based classification of speech under stress. IEEE Transactions on Speech and Audio Processing 9(3): 201–216.



How to Cite

Tavi, L. (2017). Acoustic correlates of female speech under stress based on /i/-vowel measurements. International Journal of Speech, Language and the Law, 24(2), 227–241.