A phonetic case study on prosodic variability in suicidal emergency calls
DOI:
https://doi.org/10.1558/ijsll.39667Keywords:
prosody, acoustic-phonetic analysis, emergency call, suicidal speech, intoxicationAbstract
Speech prosody has been applied in numerous speech emotion recognition tasks. Yet, especially in forensic speech science, a need for acoustic-phonetic analyses with human evaluation still exists since many current speech emotion models are trained with speech data wherein emotions are considered as constant states and the dynamic effects of the interlocutor have been disregarded; for instance, during an emergency call, the caller’s emotional prosody varies according to the communication with the emergency operator, which causes problems for existing speech emotion models when analysing individual emergency recordings. In this phonetic case study, prosodic variation was investigated in two suicidal emergency calls; eight prosodic features from two adult male callers were analysed before and after hearing the emergency operators’ offer to help. In addition, the existence of a possible linear association between the emergency operator’s and the caller’s prosodic features were evaluated. The results show that caller and operator pitch are negatively correlated (?0.33), and half of callers’ prosodic features vary significantly (p < 0.05) after hearing the offer of help.
References
Alghowinem, S. Goecke, R. Wagner, M. Epps, J. Breakspear, M. and Parker, G. (2013) Detecting depression: a comparison between spontaneous and read speech. Proceedings Acoustics, Speech and Signal Processing (ICASSP) IEEE International Conference 2013, 7547–7551, Vancouver, Canada. https://doi.org/10.1109/ICASSP.2013.6639130
Berninger, K., Hoppe, J. and Milde, B. (2016) Classification of Speaker Intoxication Using a Bidirectional Recurrent Neural Network. International Conference on Text, Speech, and Dialogue: 435–442. Springer, Cham. https://doi.org/10.1007/978-3-319-45510-5_50
Biadsy, F., Wang, W. Y., Rosenberg, A. and Hirschberg, J. (2011) Intoxication detection using phonetic, phonotactic and prosodic cues, Proceedings INTERSPEECH 2011, 3209–3212, Florence, Italy. https://www.isca-speech.org/archive/archive_papers/interspeech_2011/i11_3209.pdf
Boersma, P. and Weenink, D. (2017) Praat: doing phonetics by computer [Computer program]. Version 6.0.36 [available at: http:/praat.org].
Bone, D., Li, M., Black, M. P. and Narayanan, S. S. (2014) Intoxicated speech detection: A fusion framework with speaker-normalized hierarchical functionals and GMM supervectors. Computer Speech & Language 28(2): 375–391. https://doi.org/10.1016/j.csl.2012.09.004
Brady, J. (2006) The association between alcohol misuse and suicidal behaviour. Alcohol and Alcoholism, 41(5): 473-478. https://doi.org/10.1093/alcalc/agl060
Campbell, N. and Mokhtari, P. (2003) Voice quality: the 4th prosodic dimension. Proceedings 15th ICPhS 2003, 2417–2420, Barcelona, Spain.
C-PROM (Antoine Auchlin, U. Genève, Mathieu Avanzi, U. Neuchâtel/Paris X, Jean-Philippe Goldman, U. Genève, Anne Catherine Simon, UC Louvain). Primary data (corpus). Université de Genève, Département de linguistique (UNIGE, Genève CH), Centre de recherche Valibel - Discours et variation (Valibel, Louvain BE), Université de Neuchâtel (UniNE, Neuchâtel CH), Modèles, dynamiques, corpus - UMR 7114 (MoDyCo, Paris FR). Created 2010-06-24. Speech and Language Data Repository (SLDR/ORTOLANG). Identifier hdl:11041/c-prom-000250
Cummins, N., Scherer, S., Krajewski, J., Schnieder, S., Epps, J. and Quatieri, T. F. (2015) A review of depression and suicide risk assessment using speech analysis. Speech Communication 71: 10–49. https://doi.org/10.1016/j.specom.2015.03.004
Cummins, N., Epps, J., Breakspear, M. and Goecke, R. (2011) An investigation of depressed speech detection: Features and normalization. Proceedings INTERSPEECH 2011, 2997–3000, Florence, Italy. https://www.isca-speech.org/archive/archive_papers/interspeech_2011/i11_2997.pdf
Demenko, G. (2008). Voice stress extraction. Proceedings of Speech Prosody 2018, 53–56, Campinas, Brasil. https://pdfs.semanticscholar.org/9d56/57339e1aafb15c81036cfbab636bd8f449ff.pdf
Farrus, M. (2008) Fusing prosodic and acoustic information for speaker´ recognition. PhD Thesis, Polytechnic University of Catalonia.
Hollien, H., Dejong, G., Martin, C. A., Schwartz, R. and Liljegren, K. (2001) Effects of ethanol intoxication on speech suprasegmentals. The Journal of the Acoustical Society of America, 110(6): 3198–3206. https://doi.org/10.1121/1.1413751
Kirchhübel, C., Howard, D. M. and Stedmon A. W. (2011) Acoustic correlates of speech when under stress: research, methods and future directions. International Journal of Speech Language and the Law 18(1): 75–98. https://doi.org/10.1558/ijsll.v18i1.75
Ling, L. E., Grabe, E. and Nolan, F. (2000) Quantitative characterizations of speech rhythm: Syllable-timing in Singapore English. Language and speech 43(4), 377-401. https://doi.org/10.1177/00238309000430040301
Meyer, P., Buschermohle, E. and Fingscheidt, T. (2018) What Do Classifiers Actually Learn? a Case Study on Emotion Recognition Datasets. Proceedings INTERSPEECH 2018, 262–266, Hyderabad, India. https://www.isca-speech.org/archive/Interspeech_2018/pdfs/1851.pdf
Origlia, A., Cutugno, F. and Galata, V. (2014) Continuous emotion recognition with phonetic syllables. Speech Communication (57): 155–169. https://doi.org/10.1016/j.specom.2013.09.012
Quatieri, T.F. and Malyska, N. (2012) Vocal-source biomarkers for depression: a link to psychomotor activity. Proceedings INTERSPEECH 2012, 1059–1062, Portland, USA. https://www.isca-speech.org/archive/archive_papers/interspeech_2012/i12_1059.pdf
R Core Team (2017) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. [Available at: www.r-project.org.]
Scherer, S., Pestian, J. and Morency, L. P. (2013) Investigating the speech characteristics of suicidal adolescents. Proceedings Acoustics, Speech and Signal Processing (ICASSP), IEEE International Conference 2013 709–713, Vancouver, Canada. https://doi.org/10.1109/ICASSP.2013.6637740
Schiel, F. and Heinrich, C. (2015) Disfluencies in the speech of intoxicated speakers. International Journal of Speech, Language & the Law 22(1): 19–33. https://www.bas.uni-muenchen.de/forschung/publikationen/IJSLL_SchielHeinrich_2015.pdf
Schuller, B., Steidl, S., Batliner, A., Schiel, F. and Krajewski, J. (2011) The Interspeech 2011 speaker state challenge. Proceedings INTERSPEECH 2011, 3201–3204, Florence, Italy. https://www.isca-speech.org/archive/archive_papers/interspeech_2011/i11_3201.pdf
Sobin, C. and Sackeim, H. A. (1997) Psychomotor symptoms of depression. American Journal of Psychiatry 154(1): 4–17.
Swain, M., Routray, A., & Kabisatpathy, P. (2018). Databases, features and classifiers for speech emotion recognition: a review. International Journal of Speech Technology 21(1), 93–120. https://doi.org/10.1007/s10772-018-9491-z
Ververidis, D. and Kotropoulos, C. (2006) Emotional speech recognition: Resources, features, and methods. Speech communication 48(9): 1162–1181. https://doi.org/10.1016/j.specom.2006.04.003
Williamson, J. R., Young, D., Nierenberg, A. A., Niemi, J., Helfer, B. S. and Quatieri, T. F. (2019) Tracking depression severity from audio and video based on speech articulatory coordination. Computer Speech & Language 55: 40–56. https://doi.org/10.1016/j.csl.2018.08.004
Yeh, S.L., Lin, Y.S. and Lee, C.C. (2019) An Interaction-aware Attention Network for Speech Emotion Recognition in Spoken Dialogs. Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019, 6685–6689, Brighton, England. https://doi.org/10.1109/ICASSP.2019.8683293
Yildirim, S., Bulut, M., Lee, C. M., Kazemzadeh, A., Deng, Z., Lee, S., Narayanan, S., Busso, C. (2004) An Acoustic Study of Emotions Expressed in Speech. Proceedings INTERSPEECH 2004, 2193-2196, Jeju Island, Korea. https://www.isca-speech.org/archive/archive_papers/interspeech_2004/i04_2193.pdf