Individual patterns of disfluency across speaking styles
a forensic phonetic investigation of Standard Southern British English
DOI:
https://doi.org/10.1558/ijsll.37241Keywords:
fluency behaviour, disfluency features, TOFFA, individual differences, speaker-specificity, speaking styleAbstract
Features of speech related to fluency such as filled and silent pauses, sound prolongations, repetitions and self-interruptions exhibit considerable variation among speakers, yet the speaker-specificity of such features has received little attention inforensic phonetic research. The present study investigates the extent to which individual differences in disfluency behaviour are preserved across different speaking styles, a key concern for forensic speaker comparison cases. Disfluency phenomena in the speech of 20 male speakers of Standard Southern British English undertaking a simulated police interview task are compared with the occurrence of the same set of phenomena in the speech of the same speakers participating in a telephone conversation with an 'accomplice'. The speakers' disfluency features are analysed using TOFFA 'Taxonomy of Fluency Features for Forensic Analysis' (McDougall and Duckworth 2017). Individuals exhibit a wide range of variation in their overall rate of production of disfluency features, and these rates are relatively consistent within-speaker across interview and telephone styles. The results for each specific disfluency feature type also show patterns of relatively consistent behaviour within-speaker across-style for most features. For both interview and telephone styles, discriminant analyses based on speaker profiles of disfluency features demonstrate that disfluency features carry speaker-specific information which could be considered alongside other analyses in forensic speaker comparison cases.
References
Alzqhoul, E. A. S., Nair, B. B. T. and Guillemin, B. J. (2015) Impact of dynamic rate coding aspects of mobile phone networks on forensic voice comparison. Science and Justice 55(5): 363–374. https://doi.org/10.1016/j.scijus.2015.04.006
Amino, K. and Arai, T. (2009) Speaker-dependent characteristics of the nasals. Forensic Science International 185(1–3): 21–28. https://doi.org/10.1016/j.forsciint.2008.11.018
Bachorowski, J.-A. and Owren, M. J. (1999) Acoustic correlates of talker sex and individual talker identity are present in a short vowel segment produced in running speech. Journal of the Acoustical Society of America 106(2): 1054–1063. https://doi.org/10.1121/1.427115
Boersma, P. and Weenink, D. (1992–2018) Praat: A System for Doing Phonetics by Computer [computer program]. http://www.praat.org/
Bortfeld, H., Leon, S. D., Bloom, J. E., Schober, M. F. and Brennan, S. E. (2001) Disfluency rates in conversation: effects of age, relationship, topic, role and gender. Language and Speech 44: 123. https://doi.org/10.1177/00238309010440020101
Braun, A. (1995) Fundamental frequency – how speaker-specific is it? In A. Braun and J.-P. Köster (eds.) Studies in Forensic Phonetics (Beiträge zur Phonetik und Linguistik 64) 9–23. Trier: Wissenschaftlicher Verlag Trier.
Braun, A. and Künzel, H. J. (2003) The effect of alcohol on speech prosody. In M. J. Solé, D. Recasens and J. Romero (eds.) Proceedings of the 15th International Congress of Phonetic Sciences, Barcelona 2645–2648. https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2003/papers/p2615_2645.pdf.
Braun, A. and Rosin, A. (2015) On the speaker-specificity of hesitation markers. In The Scottish Consortium for ICPhS 2015 (ed.) Proceedings of the 18th International Congress of Phonetic Sciences, Glasgow. Paper number 731. http://www.icphs2015.info/pdfs/Papers/ICPHS0731.pdf.
Clark, H. H. and Fox Tree, J. E. (2002) Using uh and um in spontaneous speaking. Cognition 84(1): 73–111. https://doi.org/10.1016/S0010-0277(02)00017-3
de Jong, G., McDougall, K., Hudson, T. and Nolan, F. (2007) The speaker-discriminating power of sounds undergoing historical change: a formant-based study. In J. Trouvain and W. J. Barry (eds.) Proceedings of the 16th International Congress of Phonetic Sciences, Saarbrücken 1813–1816. http://www.icphs2007.de/conference/Papers/1542/1542.pdf.
Dellwo, V. and Koreman, J. (2008) How speaker idiosyncratic is measurable speech rhythm? Paper presented at the International Association for Forensic Phonetics and Acoustics Annual Conference, Lausanne, 20–23 July 2008.
Dellwo, V., Leemann, A. and Kolly, M.-J. (2015) Rhythmic variability between speakers: articulatory, prosodic, and linguistic factors. Journal of the Acoustical Society of America 137(3): 1513–1528. https://doi.org/10.1121/1.4906837
Eklund, R. (2004) Disfluency in Swedish human–human and human–machine travel booking dialogues. Linköping University Studies in Science and Technology Dissertation No. 882 http://www.ida.liu.se/~robek28/pdf/Eklund_2004_PhD_Thesis_Corrected.pdf.
Finlayson, I. R. and Corley, M. (2012) Disfluency in dialogue: an intentional signal from the speaker? Psychonomic Bulletin and Review 19(5): 921–928. https://doi.org/10.3758/s13423-012-0279-x
Goldman-Eisler, F. (1968) Psycholinguistics: Experiments in Spontaneous Speech. London: Academic Press.
Guillemin, B. J. and Watson, C. (2008) Impact of the GSM mobile phone network on the speech signal – some preliminary findings. International Journal of Speech, Language and the Law 15(2): 193–218.
Hollien, H., de Jong, G., Martin, C. A., Schwartz, R. and Liljegren, K. (2001) Effects of ethanol intoxication on speech suprasegmentals. Journal of the Acoustical Society of America 110(6): 3198–3206. https://doi.org/10.1121/1.1413751
Hollien, H., Liljegren, K., Martin, C. A. and de Jong, G. (1999) Prediction of intoxication levels by speech analysis. In A. Braun (ed.) Advances in Phonetics 40–50. Stuttgart: Steiner Verlag.
Hudson, T., de Jong, G., McDougall, K., Harrison, P. and Nolan, F. (2007) F0 statistics for 100 young male speakers of Standard Southern British English. In J. Trouvain and W. J. Barry (eds.) Proceedings of the 16th International Congress of Phonetic Sciences, Saarbrücken 1809–1812. http://www.icphs2007.de/conference/Papers/1570/1570.pdf.
Hughes, V. (2014). The definition of the relevant population and the collection of data for likelihood ratio-based forensic voice comparison. PhD dissertation, University of York.
Hughes, V., Wood, S. and Foulkes, P. (2016) Strength of forensic voice comparison evidence from the acoustics of filled pauses. International Journal of Speech Language and the Law 23(1): 99–132. https://doi.org/10.1558/ijsll.v23i1.29874
Huntley Bahr, R. and Pass, K. J. (1996) The influence of style-shifting on voice identification. Forensic Linguistics 3(1): 24–38.
Ishihara, S. and Kinoshita, Y. (2008) How many do we need? Exploration of the population size effect on the performance of forensic speaker classification. In Proceedings of the 9th Annual Conference of the International Speech Communication Association (Interspeech), Brisbane, Australia 1941–1944.
Jessen, M. (1997) Speaker-specific information in voice quality parameters. Forensic Linguistics 4(1): 84–103.
Jessen, M. (2007) Forensic reference data on articulation rate in German. Science and Justice 47(2): 50–67. https://doi.org/10.1016/j.scijus.2007.03.003
Jessen, M. (2008) Forensic phonetics. Language and Linguistics Compass 2(4): 671–711. https://doi.org/10.1111/j.1749-818X.2008.00066.x
Jessen, M., Köster, O. and Gfroerer, S. (2005) Influence of vocal effort on average and variability of fundamental frequency. International Journal of Speech Language and the Law 12(2): 174–213. https://doi.org/10.1558/sll.2005.12.2.174
Kasl, S. V. and Mahl, G. F. (1965) Relationship of disturbances and hesitations in spontaneous speech to anxiety. Journal of Personality and Social Psychology 1(5): 425–433. https://doi.org/10.1037/h0021918
Kavanagh, C. (2012) New consonantal acoustic parameters for forensic speaker comparison. PhD dissertation, University of York.
Kinoshita, Y., Ishihara, S. and Rose, P. (2009) Exploring the discriminatory potential of F0 distribution parameters in traditional forensic speaker recognition. International Journal of Speech, Language and the Law 16(1): 91–111. https://doi.org/10.1558/ijsll.v16i1.91
Klatt, D. (1976) Linguistic uses of segmental duration in English: acoustic and perceptual evidence. The Journal of the Acoustical Society of America 59(5): 1208–1221. https://doi.org/10.1121/1.380986
Künzel, H. J. (1989) How well does average fundamental frequency correlate with speaker height and weight? Phonetica 46(1–3): 117–125. https://doi.org/10.1159/000261832
Künzel, H. J. (1997) Some general phonetic and forensic aspects of speaking tempo. Forensic Linguistics 4(1): 48–83.
Künzel, H. J. (2001) Beware of the ‘telephone effect’: the influence of telephone transmission on the measurement of formant frequencies. Forensic Linguistics 8(1): 80–99.
Leemann, A. and Kolly, M.-J. (2015) Speaker-invariant suprasegmental temporal features in normal and disguised speech. Speech Communication 75: 97–122. https://doi.org/10.1016/j.specom.2015.10.002
Leemann, A., Kolly, M.-J. and Dellwo, V. (2014) Speaker-individuality in suprasegmental temporal features: implications for forensic voice comparison. Forensic Science International 238: 59–67. https://doi.org/10.1016/j.forsciint.2014.02.019
Leemann, A., Mixdorff, H., O’Reilly, M., Kolly, M.-J. and Dellwo, V. (2014) Speaker-individuality in Fujisaki model f0 features: implications for forensic voice comparison. International Journal of Speech, Language and the Law 21(2): 343–370. https://doi.org/10.1558/ijsll.v21i2.343
Lindblom, B. (1963) Spectrographic study of vowel reduction. Journal of the Acoustical Society of America 35(11): 1773–1781. https://doi.org/10.1121/1.1918816
Lindh, J. and Eriksson, A. (2007) Robustness of long time measures of fundamental frequency. In Proceedings of the 8th Annual Conference of the International Speech Communication Association (Interspeech), Antwerp, Belgium 2025–2028.
McDougall, K. (2004) Speaker-specific formant dynamics: an experiment on Australian English /a?/. International Journal of Speech, Language and the Law 11(1): 103–130. https://doi.org/10.1558/sll.2004.11.1.103
McDougall, K. (2006) Dynamic features of speech and the characterization of speakers: toward a new approach using formant frequencies. International Journal of Speech, Language and the Law 13(1): 89–126. https://doi.org/10.1558/sll.2006.13.1.89
McDougall, K. and Duckworth, M. (2017) Profiling fluency: an analysis of individual variation in disfluencies in adult males. Speech Communication 95: 16–27. https://doi.org/10.1016/j.specom.2017.10.001
McDougall, K. and Nolan, F. (2007) Discrimination of speakers using the formant dynamics of /u?/ in British English. In J. Trouvain and W. J. Barry (eds.) Proceedings of the 16th International Congress of Phonetic Sciences, Saarbrücken 1825–1828. http://www.icphs2007.de/conference/Papers/1567/1567.pdf.
McDougall, K., Rhodes, R., Duckworth, M., French, J. P., Kirchhübel, C. and Wormald, J. (2018) Applying disfluency analysis in forensic speaker comparison casework. Paper presented at the International Association for Forensic Phonetics and Acoustics Annual Conference, Huddersfield, 29 July–1 August 2018, Huddersfield.
Moos, A. (2010) Long-term formant distribution as a measure of speaker characteristics in read and spontaneous speech. The Phonetician 101/102: 7–24.
Morrison, G. S. (2008) Forensic voice comparison using likelihood ratios based on polynomial curves fitted to the formant trajectories of Australian English /a?/. International Journal of Speech, Language and the Law 15(2): 249–266.
Nair, B. B. T., Alzqhoul, E. A. S. and Guillemin, B. J. (2016) Impact of the GSM and CDMA mobile phone networks on the strength of speech evidence in forensic voice comparison. Journal of Forensic Research 7(2): 322. https://www.omicsonline.org/open-access/impact-of-the-gsm-and-cdma-mobile-phone-networks-on-the-strength-ofspeech-evidence-in-forensic-voice-comparison-2157-7145-1000324.pdf. https://doi.org/10.4172/2157-7145.1000324
Nolan, F. (1983) The Phonetic Bases of Speaker Recognition. Cambridge: Cambridge University Press.
Nolan, F. (1997) Speaker recognition and forensic phonetics. In W. J. Hardcastle and J. Laver (eds.) The Handbook of Phonetic Sciences 744–767. Oxford: Blackwell.
Nolan, F. (2002) Intonation in speaker identification: an experiment on pitch alignment features. Forensic Linguistics 9(1): 1–21. https://doi.org/10.1558/sll.2002.9.1.1
Nolan, F. and Grigoras, C. (2005) A case for formant analysis in forensic speaker identification. International Journal of Speech, Language and the Law 12(2): 143–173. https://doi.org/10.1558/sll.2005.12.2.143
Nolan, F., McDougall, K., de Jong, G. and Hudson, T. (2009) The DyViS database: style-controlled recordings of 100 homogeneous speakers for forensic phonetic research. International Journal of Speech, Language and the Law 16(1): 31–57.
Oviatt, S. (1995) Predicting spoken disfluencies during human–computer interaction. Computer Speech and Language 9(1): 19–35. https://doi.org/10.1006/csla.1995.0002
Peterson, G. E. and Barney, H. L. (1952) Control methods used in a study of the vowels. Journal of the Acoustical Society of America 24(2): 175–184. https://doi.org/10.1121/1.1906875
Roberts, P. M., Meltzer, A. and Wilding, J. (2009) Disfluencies in non-stuttering adults across sample lengths and topics. Journal of Communication Disorders 42(6): 414–427. https://doi.org/10.1016/j.jcomdis.2009.06.001
Rose, P. (1999) Long- and short-term within-speaker differences in the formants of Australian hello. Journal of the International Phonetic Association 29(1): 1–31. https://doi.org/10.1017/S0025100300006393
Rose, P. (2002) Forensic Speaker Identification. London: Taylor and Francis. https://doi.org/10.1201/9780203166369
Rose, P. (2010) The effect of correlation on strength of evidence estimates in Forensic Voice Comparison: uni- and multivariate Likelihood Ratio-based discrimination with Australian English vowel acoustics. International Journal of Biometrics 2(4): 316–329. https://doi.org/10.1504/IJBM.2010.035447
Schachter, S., Christenfeld, N., Ravina, B. and Bilous, F. (1991) Speech disfluency and the structure of knowledge. Journal of Personality and Social Psychology 60(3): 362–367. https://doi.org/10.1037/0022-3514.60.3.362
Schiel, F. and Heinrich, C. (2015) Disfluencies in the speech of intoxicated speakers. International Journal of Speech Language and the Law 22(1): 19–33. https://doi.org/10.1558/ijsll.v22i1.24767
Schiel, F., Heinrich, C. and Barfüßer, S. (2012) Alcohol language corpus: the first public corpus of alcoholized German speech. Language Resources and Evaluation 46(3): 503–521. https://doi.org/10.1007/s10579-011-9139-y
Shriberg, E. (2001) To ‘errrr’ is human: ecology and acoustics of speech disfluencies. Journal of the International Phonetic Association 31(1): 153–169. https://doi.org/10.1017/S0025100301001128
Tabachnick, B. G. and Fidell, L. S. (2014) Using Multivariate Statistics (6th edn). Harlow: Pearson.
Wightman, C. W., Shattuck-Hufnagel, S., Ostendorf, M. and Price, P. J. (1992) Segmental durations in the vicinity of prosodic phrase boundaries. The Journal of the Acoustical Society of America 91(3): 1707–1717. https://doi.org/10.1121/1.402450