Acoustic correlates of female speech under stress based on /i/-vowel measurements
DOI:
https://doi.org/10.1558/ijsll.32506Keywords:
Speech under stress, Acoustic correlates, Emergency callsAbstract
The purpose of this article is to measure and evaluate commonly identified, yet rather inconsistent, acoustic correlates of speech under stress from authentic emergency call recordings. In this study, ten different acoustic parameters are measured from manually segmented /i/-vowels and hypotheses based on previous studies are statistically tested for a set of female emergency call recordings. The statistical analyses confirm that in comparison to the neutral speech group, the speech under stress group differs in fundamental frequency, shimmer, harmonicity, Hammarberg index, F1, F2, F3 and formant dispersion, which mostly supports the findings from previous studies. Conversely, jitter and vowel duration do not show any statistical difference between the speech under stress group and the neutral group. Furthermore, the results substantiate that stress recognition using different acoustic parameters is feasible from data sets as small as vowel segments; however, the effect of inter-speaker variation must not be underestimated. In future research, a stress detection model for telephone bandpass limited speech based on the optimal combination of acoustic parameters will be created.References
Boersma, P. and Weenink, D. (2016) Praat: doing phonetics by computer [Computer program]. Version 6.0.21. Retrieved 25 September 2016 from http://www.praat.org/
Brockmann, M., Drinnan, M. J., Storck, C. and Carding, P. N. (2011) Reliable jitter and shimmer measurements in voice clinics: the relevance of vowel, gender, vocal intensity, and fundamental frequency effects in a typical clinical task. Journal of Voice 25(1): 44–53. https://doi.org/10.1016/j.jvoice.2009.07.002
Cummings, N., Scherer, S., Krajewski, J., Schnieder, S., Epps, J. and Quatieri, T. F. (2015) A review of depression and suicide risk assessment using speech analysis. Speech Communication 71: 10–49. https://doi.org/10.1016/j.specom.2015.03.004
Dellwo, V., Leemann, A. and Kolly, M.-J. (2015) Rhythmic variability between speakers: articulatory, prosodic and linguistic factors. Journal of the Acoustical Society of America, 137(3): 1513–1528. https://doi.org/10.1121/1.4906837
Demenko, G. (2008) Voice stress extraction. Proceedings of Speech Prosody Conference. 6–9 May, Campinas, Brazil: 53–56.
Demenko, G. and Jastrz?bska, M. (2012) Analysis of voice stress in call centers conversations. Proceedings of Speech Prosody. Shanghai, China: 183–186.
Farrus, M. (2008) Fusing prosodic and acoustic information for speaker recognition. PhD Thesis, Polytechnic University of Catalonia.
Gafni, C. (2017) Hammarberg index – Praat plugin. Retrieved 5 June 2017 from https://github.com/chengafni/praat
Ga?ka, J., Grzybowska, J., Igras, M., Jaciow, P., Wajda, K., Witkowski, M. and Zio?ko, M. (2015) System supporting speaker identification in emergency call centers. In Proceedings of Interspeech.
Grawunder, S. and Winter, B. (2010) Acoustic correlates of politeness: prosodic and voice quality measures in polite and informal speech of Korean and German speakers. In International Conference for Speech Prosody 5, Chicago.
He, L., Lech, M., Maddage, N. C. and Allen, N. B. (2011) Study of empirical mode decomposition and spectral analysis for stress and emotion classification in natural speech. Biomedical Signal Processing and Control 6(2): 139–146. https://doi.org/10.1016/j.bspc.2010.11.001
Hill, D. R. (2007) Speaker classification concepts: past, present and future. In C. Muller (ed.) Speaker Classification I: Fundamentals, Features, and Methods 2 1–46. Berlin: Springer-Verlag. https://doi.org/10.1007/978-3-540-74200-5_2
Hollien, H. (1990) Acoustics of Crime: The New Science of Forensic Phonetics. New York: Plenum. https://doi.org/10.1007/978-1-4899-0673-1
Jessen, M. (2008) Forensic phonetics. Language and Linguistics Compass 2(4): 671–711. https://doi.org/10.1111/j.1749-818X.2008.00066.x
Karlsson, I., Banziger, T., Dankovicova, J., Johnstone, T., Lindberg, J., Melin, H.,
Nolan, F. and Scherer, K. (2000) Speaker verification with elicited speaking styles in the VeriVox project. Speech Communication 31(2–3): 121–129. https://doi.org/10.1016/S0167-6393(99)00073-4
Kinnunen, T. and Li, H. (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Communication 52(1): 12–40. https://doi.org/10.1016/j.specom.2009.08.009
Kirchhubel, C. and Howard, D. M. (2013) Detecting suspicious behaviour using speech: acoustic correlates of deceptive speech – an exploratory investigation. Applied Ergonomics 44(5): 694–702. https://doi.org/10.1016/j.apergo.2012.04.016
Kirchhubel, C., Howard, D. M. and Stedmon A. W. (2011) Acoustic correlates of speech when under stress: research, methods and future directions. International Journal of Speech Language and the Law 18(1): 75–98. https://doi.org/10.1558/ijsll.v18i1.75
Lefter, I., Rothkrantz, L. J., Van Leeuwen, D. A. and Wiggers, P. (2011) Automatic stress detection in emergency (telephone) calls. International Journal of Intelligent Defence Support Systems 4(2): 148–168. https://doi.org/10.1504/IJIDSS.2011.039547
Lennes, M. (2003) Collect formant data from files – Praat script. Retrieved 1 September 2016 from http://www.helsinki.fi/~lennes/praat-scripts/
McCloy, D. R. (2012) Normalizing and plotting vowels with the phonR package. Technical Reports of the UW Linguistic Phonetics Laboratory: 1–8. Retrieved on 8 June from
http://dan.mccloy.info/pubs/McCloy2012_phonR.pdf
Niemi-Laitinen, T. (1999) Puhujantunnistus rikostutkinnassa. Licentiate thesis, Helsingin yliopisto, Helsinki.
Patil, A. S. and Hansen J. H. L. (2007) Speech under stress: analysis, modelling and recognition. In C. Muller (ed.) Speaker Classification I: Fundamentals, Features, and Methods 108–137. Berlin: Springer-Verlag.
R Core Team (2014) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.
Rasanen, O., Doyle, G. and Frank, M. C. (2015) Unsupervised word discovery from speech using automatic segmentation into syllable-like units. In Proceedings of Interspeech.
Scherer, K. R., Grandjean, D., Johnstone, T., Klasmeyer, G. and Banziger, T. (2002) Acoustic correlates of task load and stress. In Proceedings of Interspeech.
Schroder, M. (2001) Emotional speech synthesis: a review. In Proceedings of Interspeech.
Sigmund, M. (2006) Introducing the database ExamStress for speech under stress. In Proceedings of the 7th Nordic Signal Processing Symposium-NORSIG 2006: 290–293.https://doi.org/10.1109/NORSIG.2006.275258
Sondhi, S., Khan, M., Vijay, R. and Salhan, A. K. (2015) Vocal indicators of emotional stress. International Journal of Computer Applications 122(15): 38–45. https://doi.org/10.5120/21780-5056
Steeneken, H. J. and Hansen, J. H. (1999) Speech under stress conditions: overview of the effect on speech production and on system performance. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing 15–19 March: 2079–2082. https://doi.org/10.1109/ICASSP.1999.758342
Teixeira, J. P. and Fernandes, P. O. (2004) Jitter, shimmer and HNR classification within gender, tones and vowels in healthy voices. Procedia Technology 16: 1228–1237. https://doi.org/10.1016/j.protcy.2014.10.138
Van Lierde, K., van Heule, S., De Ley, S., Mertens, E. and Claeys, S. (2009) Effect of psychological stress on female vocal quality. Folia Phoniatrica et Logopaedica 61(2):105–111. https://doi.org/10.1159/000209273
Womack, B. D. and Hansen, J. H. L. (1996) Classification of speech under stress using target driven features. Speech Communication 20(1–2): 131–150. https://doi.org/10.1016/S0167-6393(96)00049-0
Xu, Y. (2013) ProsodyPro – a tool for large-scale systematic prosody analysis. In Proceedings of Tools and Resources for the Analysis of Speech Prosody (TRASP 2013), Aix-en-Provence, France: 7–10.
Zhou, G., Hansen, J. H. and Kaiser, J. F. (2001) Nonlinear feature based classification of speech under stress. IEEE Transactions on Speech and Audio Processing 9(3): 201–216. https://doi.org/10.1109/89.905995
Brockmann, M., Drinnan, M. J., Storck, C. and Carding, P. N. (2011) Reliable jitter and shimmer measurements in voice clinics: the relevance of vowel, gender, vocal intensity, and fundamental frequency effects in a typical clinical task. Journal of Voice 25(1): 44–53. https://doi.org/10.1016/j.jvoice.2009.07.002
Cummings, N., Scherer, S., Krajewski, J., Schnieder, S., Epps, J. and Quatieri, T. F. (2015) A review of depression and suicide risk assessment using speech analysis. Speech Communication 71: 10–49. https://doi.org/10.1016/j.specom.2015.03.004
Dellwo, V., Leemann, A. and Kolly, M.-J. (2015) Rhythmic variability between speakers: articulatory, prosodic and linguistic factors. Journal of the Acoustical Society of America, 137(3): 1513–1528. https://doi.org/10.1121/1.4906837
Demenko, G. (2008) Voice stress extraction. Proceedings of Speech Prosody Conference. 6–9 May, Campinas, Brazil: 53–56.
Demenko, G. and Jastrz?bska, M. (2012) Analysis of voice stress in call centers conversations. Proceedings of Speech Prosody. Shanghai, China: 183–186.
Farrus, M. (2008) Fusing prosodic and acoustic information for speaker recognition. PhD Thesis, Polytechnic University of Catalonia.
Gafni, C. (2017) Hammarberg index – Praat plugin. Retrieved 5 June 2017 from https://github.com/chengafni/praat
Ga?ka, J., Grzybowska, J., Igras, M., Jaciow, P., Wajda, K., Witkowski, M. and Zio?ko, M. (2015) System supporting speaker identification in emergency call centers. In Proceedings of Interspeech.
Grawunder, S. and Winter, B. (2010) Acoustic correlates of politeness: prosodic and voice quality measures in polite and informal speech of Korean and German speakers. In International Conference for Speech Prosody 5, Chicago.
He, L., Lech, M., Maddage, N. C. and Allen, N. B. (2011) Study of empirical mode decomposition and spectral analysis for stress and emotion classification in natural speech. Biomedical Signal Processing and Control 6(2): 139–146. https://doi.org/10.1016/j.bspc.2010.11.001
Hill, D. R. (2007) Speaker classification concepts: past, present and future. In C. Muller (ed.) Speaker Classification I: Fundamentals, Features, and Methods 2 1–46. Berlin: Springer-Verlag. https://doi.org/10.1007/978-3-540-74200-5_2
Hollien, H. (1990) Acoustics of Crime: The New Science of Forensic Phonetics. New York: Plenum. https://doi.org/10.1007/978-1-4899-0673-1
Jessen, M. (2008) Forensic phonetics. Language and Linguistics Compass 2(4): 671–711. https://doi.org/10.1111/j.1749-818X.2008.00066.x
Karlsson, I., Banziger, T., Dankovicova, J., Johnstone, T., Lindberg, J., Melin, H.,
Nolan, F. and Scherer, K. (2000) Speaker verification with elicited speaking styles in the VeriVox project. Speech Communication 31(2–3): 121–129. https://doi.org/10.1016/S0167-6393(99)00073-4
Kinnunen, T. and Li, H. (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Communication 52(1): 12–40. https://doi.org/10.1016/j.specom.2009.08.009
Kirchhubel, C. and Howard, D. M. (2013) Detecting suspicious behaviour using speech: acoustic correlates of deceptive speech – an exploratory investigation. Applied Ergonomics 44(5): 694–702. https://doi.org/10.1016/j.apergo.2012.04.016
Kirchhubel, C., Howard, D. M. and Stedmon A. W. (2011) Acoustic correlates of speech when under stress: research, methods and future directions. International Journal of Speech Language and the Law 18(1): 75–98. https://doi.org/10.1558/ijsll.v18i1.75
Lefter, I., Rothkrantz, L. J., Van Leeuwen, D. A. and Wiggers, P. (2011) Automatic stress detection in emergency (telephone) calls. International Journal of Intelligent Defence Support Systems 4(2): 148–168. https://doi.org/10.1504/IJIDSS.2011.039547
Lennes, M. (2003) Collect formant data from files – Praat script. Retrieved 1 September 2016 from http://www.helsinki.fi/~lennes/praat-scripts/
McCloy, D. R. (2012) Normalizing and plotting vowels with the phonR package. Technical Reports of the UW Linguistic Phonetics Laboratory: 1–8. Retrieved on 8 June from
http://dan.mccloy.info/pubs/McCloy2012_phonR.pdf
Niemi-Laitinen, T. (1999) Puhujantunnistus rikostutkinnassa. Licentiate thesis, Helsingin yliopisto, Helsinki.
Patil, A. S. and Hansen J. H. L. (2007) Speech under stress: analysis, modelling and recognition. In C. Muller (ed.) Speaker Classification I: Fundamentals, Features, and Methods 108–137. Berlin: Springer-Verlag.
R Core Team (2014) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.
Rasanen, O., Doyle, G. and Frank, M. C. (2015) Unsupervised word discovery from speech using automatic segmentation into syllable-like units. In Proceedings of Interspeech.
Scherer, K. R., Grandjean, D., Johnstone, T., Klasmeyer, G. and Banziger, T. (2002) Acoustic correlates of task load and stress. In Proceedings of Interspeech.
Schroder, M. (2001) Emotional speech synthesis: a review. In Proceedings of Interspeech.
Sigmund, M. (2006) Introducing the database ExamStress for speech under stress. In Proceedings of the 7th Nordic Signal Processing Symposium-NORSIG 2006: 290–293.https://doi.org/10.1109/NORSIG.2006.275258
Sondhi, S., Khan, M., Vijay, R. and Salhan, A. K. (2015) Vocal indicators of emotional stress. International Journal of Computer Applications 122(15): 38–45. https://doi.org/10.5120/21780-5056
Steeneken, H. J. and Hansen, J. H. (1999) Speech under stress conditions: overview of the effect on speech production and on system performance. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing 15–19 March: 2079–2082. https://doi.org/10.1109/ICASSP.1999.758342
Teixeira, J. P. and Fernandes, P. O. (2004) Jitter, shimmer and HNR classification within gender, tones and vowels in healthy voices. Procedia Technology 16: 1228–1237. https://doi.org/10.1016/j.protcy.2014.10.138
Van Lierde, K., van Heule, S., De Ley, S., Mertens, E. and Claeys, S. (2009) Effect of psychological stress on female vocal quality. Folia Phoniatrica et Logopaedica 61(2):105–111. https://doi.org/10.1159/000209273
Womack, B. D. and Hansen, J. H. L. (1996) Classification of speech under stress using target driven features. Speech Communication 20(1–2): 131–150. https://doi.org/10.1016/S0167-6393(96)00049-0
Xu, Y. (2013) ProsodyPro – a tool for large-scale systematic prosody analysis. In Proceedings of Tools and Resources for the Analysis of Speech Prosody (TRASP 2013), Aix-en-Provence, France: 7–10.
Zhou, G., Hansen, J. H. and Kaiser, J. F. (2001) Nonlinear feature based classification of speech under stress. IEEE Transactions on Speech and Audio Processing 9(3): 201–216. https://doi.org/10.1109/89.905995
Published
2017-12-20
Issue
Section
Articles
How to Cite
Tavi, L. (2017). Acoustic correlates of female speech under stress based on /i/-vowel measurements. International Journal of Speech, Language and the Law, 24(2), 227-241. https://doi.org/10.1558/ijsll.32506