A phonetic case study on prosodic variability in suicidal emergency calls

Authors

  • Lauri Tavi University of Eastern Finland
  • Stefan Werner University of Eastern Finland

DOI:

https://doi.org/10.1558/ijsll.39667

Keywords:

prosody, acoustic-phonetic analysis, emergency call, suicidal speech, intoxication

Abstract

Speech prosody has been applied in numerous speech emotion recognition tasks. Yet, especially in forensic speech science, a need for acoustic-phonetic analyses with human evaluation still exists since many current speech emotion models are trained with speech data wherein emotions are considered as constant states and the dynamic effects of the interlocutor have been disregarded; for instance, during an emergency call, the caller’s emotional prosody varies according to the communication with the emergency operator, which causes problems for existing speech emotion models when analysing individual emergency recordings. In this phonetic case study, prosodic variation was investigated in two suicidal emergency calls; eight prosodic features from two adult male callers were analysed before and after hearing the emergency operators’ offer to help. In addition, the existence of a possible linear association between the emergency operator’s and the caller’s prosodic features were evaluated. The results show that caller and operator pitch are negatively correlated (?0.33), and half of callers’ prosodic features vary significantly (p < 0.05) after hearing the offer of help.

Author Biographies

  • Lauri Tavi, University of Eastern Finland

    Lauri Tavi, MA, is currently an early-stage researcher at the laboratory of Linguistics and Language Technology, University of Eastern Finland. He has also visited Tallinn University of Technology as a doctoral student and National Bureau of Investigation Forensic Laboratory, Finland, as a short-term intern. His PhD covers acoustic-phonetic analysis of emotions in authentic emergency calls and his research interests include forensic phonetics, paralinguistic speaker state recognition, machine learning and speech prosody.

  • Stefan Werner, University of Eastern Finland

    Stefan Werner, PhD, is Linguistics and Language Technology lecturer at the University of Eastern Finland, Joensuu. He has held courses and workshops on acoustic analysis, Praat and statistical analysis of speech data at universities in Finland, Estonia, Germany, Switzerland and Japan. His research interests include typical and atypical variation in speech, speech acoustics and speech technology. 

References

Alghowinem, S. Goecke, R. Wagner, M. Epps, J. Breakspear, M. and Parker, G. (2013) Detecting depression: a comparison between spontaneous and read speech. Proceedings Acoustics, Speech and Signal Processing (ICASSP) IEEE International Conference 2013, 7547–7551, Vancouver, Canada. https://doi.org/10.1109/ICASSP.2013.6639130

Berninger, K., Hoppe, J. and Milde, B. (2016) Classification of Speaker Intoxication Using a Bidirectional Recurrent Neural Network. International Conference on Text, Speech, and Dialogue: 435–442. Springer, Cham. https://doi.org/10.1007/978-3-319-45510-5_50

Biadsy, F., Wang, W. Y., Rosenberg, A. and Hirschberg, J. (2011) Intoxication detection using phonetic, phonotactic and prosodic cues, Proceedings INTERSPEECH 2011, 3209–3212, Florence, Italy. https://www.isca-speech.org/archive/archive_papers/interspeech_2011/i11_3209.pdf

Boersma, P. and Weenink, D. (2017) Praat: doing phonetics by computer [Computer program]. Version 6.0.36 [available at: http:/praat.org].

Bone, D., Li, M., Black, M. P. and Narayanan, S. S. (2014) Intoxicated speech detection: A fusion framework with speaker-normalized hierarchical functionals and GMM supervectors. Computer Speech & Language 28(2): 375–391. https://doi.org/10.1016/j.csl.2012.09.004

Brady, J. (2006) The association between alcohol misuse and suicidal behaviour. Alcohol and Alcoholism, 41(5): 473-478. https://doi.org/10.1093/alcalc/agl060

Campbell, N. and Mokhtari, P. (2003) Voice quality: the 4th prosodic dimension. Proceedings 15th ICPhS 2003, 2417–2420, Barcelona, Spain.

C-PROM (Antoine Auchlin, U. Genève, Mathieu Avanzi, U. Neuchâtel/Paris X, Jean-Philippe Goldman, U. Genève, Anne Catherine Simon, UC Louvain). Primary data (corpus). Université de Genève, Département de linguistique (UNIGE, Genève CH), Centre de recherche Valibel - Discours et variation (Valibel, Louvain BE), Université de Neuchâtel (UniNE, Neuchâtel CH), Modèles, dynamiques, corpus - UMR 7114 (MoDyCo, Paris FR). Created 2010-06-24. Speech and Language Data Repository (SLDR/ORTOLANG). Identifier hdl:11041/c-prom-000250

Cummins, N., Scherer, S., Krajewski, J., Schnieder, S., Epps, J. and Quatieri, T. F. (2015) A review of depression and suicide risk assessment using speech analysis. Speech Communication 71: 10–49. https://doi.org/10.1016/j.specom.2015.03.004

Cummins, N., Epps, J., Breakspear, M. and Goecke, R. (2011) An investigation of depressed speech detection: Features and normalization. Proceedings INTERSPEECH 2011, 2997–3000, Florence, Italy. https://www.isca-speech.org/archive/archive_papers/interspeech_2011/i11_2997.pdf

Demenko, G. (2008). Voice stress extraction. Proceedings of Speech Prosody 2018, 53–56, Campinas, Brasil. https://pdfs.semanticscholar.org/9d56/57339e1aafb15c81036cfbab636bd8f449ff.pdf

Farrus, M. (2008) Fusing prosodic and acoustic information for speaker´ recognition. PhD Thesis, Polytechnic University of Catalonia.

Hollien, H., Dejong, G., Martin, C. A., Schwartz, R. and Liljegren, K. (2001) Effects of ethanol intoxication on speech suprasegmentals. The Journal of the Acoustical Society of America, 110(6): 3198–3206. https://doi.org/10.1121/1.1413751

Kirchhübel, C., Howard, D. M. and Stedmon A. W. (2011) Acoustic correlates of speech when under stress: research, methods and future directions. International Journal of Speech Language and the Law 18(1): 75–98. https://doi.org/10.1558/ijsll.v18i1.75

Ling, L. E., Grabe, E. and Nolan, F. (2000) Quantitative characterizations of speech rhythm: Syllable-timing in Singapore English. Language and speech 43(4), 377-401. https://doi.org/10.1177/00238309000430040301

Meyer, P., Buschermohle, E. and Fingscheidt, T. (2018) What Do Classifiers Actually Learn? a Case Study on Emotion Recognition Datasets. Proceedings INTERSPEECH 2018, 262–266, Hyderabad, India. https://www.isca-speech.org/archive/Interspeech_2018/pdfs/1851.pdf

Origlia, A., Cutugno, F. and Galata, V. (2014) Continuous emotion recognition with phonetic syllables. Speech Communication (57): 155–169. https://doi.org/10.1016/j.specom.2013.09.012

Quatieri, T.F. and Malyska, N. (2012) Vocal-source biomarkers for depression: a link to psychomotor activity. Proceedings INTERSPEECH 2012, 1059–1062, Portland, USA. https://www.isca-speech.org/archive/archive_papers/interspeech_2012/i12_1059.pdf

R Core Team (2017) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. [Available at: www.r-project.org.]

Scherer, S., Pestian, J. and Morency, L. P. (2013) Investigating the speech characteristics of suicidal adolescents. Proceedings Acoustics, Speech and Signal Processing (ICASSP), IEEE International Conference 2013 709–713, Vancouver, Canada. https://doi.org/10.1109/ICASSP.2013.6637740

Schiel, F. and Heinrich, C. (2015) Disfluencies in the speech of intoxicated speakers. International Journal of Speech, Language & the Law 22(1): 19–33. https://www.bas.uni-muenchen.de/forschung/publikationen/IJSLL_SchielHeinrich_2015.pdf

Schuller, B., Steidl, S., Batliner, A., Schiel, F. and Krajewski, J. (2011) The Interspeech 2011 speaker state challenge. Proceedings INTERSPEECH 2011, 3201–3204, Florence, Italy. https://www.isca-speech.org/archive/archive_papers/interspeech_2011/i11_3201.pdf

Sobin, C. and Sackeim, H. A. (1997) Psychomotor symptoms of depression. American Journal of Psychiatry 154(1): 4–17.

Swain, M., Routray, A., & Kabisatpathy, P. (2018). Databases, features and classifiers for speech emotion recognition: a review. International Journal of Speech Technology 21(1), 93–120. https://doi.org/10.1007/s10772-018-9491-z

Ververidis, D. and Kotropoulos, C. (2006) Emotional speech recognition: Resources, features, and methods. Speech communication 48(9): 1162–1181. https://doi.org/10.1016/j.specom.2006.04.003

Williamson, J. R., Young, D., Nierenberg, A. A., Niemi, J., Helfer, B. S. and Quatieri, T. F. (2019) Tracking depression severity from audio and video based on speech articulatory coordination. Computer Speech & Language 55: 40–56. https://doi.org/10.1016/j.csl.2018.08.004

Yeh, S.L., Lin, Y.S. and Lee, C.C. (2019) An Interaction-aware Attention Network for Speech Emotion Recognition in Spoken Dialogs. Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019, 6685–6689, Brighton, England. https://doi.org/10.1109/ICASSP.2019.8683293

Yildirim, S., Bulut, M., Lee, C. M., Kazemzadeh, A., Deng, Z., Lee, S., Narayanan, S., Busso, C. (2004) An Acoustic Study of Emotions Expressed in Speech. Proceedings INTERSPEECH 2004, 2193-2196, Jeju Island, Korea. https://www.isca-speech.org/archive/archive_papers/interspeech_2004/i04_2193.pdf

Published

2020-08-27

Issue

Section

Articles

How to Cite

Tavi, L., & Werner, S. (2020). A phonetic case study on prosodic variability in suicidal emergency calls. International Journal of Speech, Language and the Law, 27(1), 59-74. https://doi.org/10.1558/ijsll.39667