The contribution of dynamic versus static formant information in conversational speech
Keywords:speaker-specificity, formant dynamics, conversational speech, vowel-inherent spectral change
The relative contributions of static and dynamic formant representations to speaker-specificity were investigated in conversational speech and in two vowels varying in inherent spectral change. Using polynomial fits, the contribution of dynamic formant coefficients to speaker-specificity relative to that of the formant intercept was investigated in the diphthongal vowel [ei] taken from English and Dutch conversational speech. The [ei] tokens were sampled from various linguistic contexts and analysed in an LR approach. Results show that formant dynamics contain speaker-specific information in conversational speech even though the high contextual variation seems to reduce its effect relative to that reported by earlier work. Vowels differ in inherent dynamicity and therefore, the added value of dynamic formant information to speaker-specificity was also compared between vowels differing in inherent spectral change. Using Dutch data, the contribution of formant dynamics to speaker-specificity was compared between [ei] and [a?] tokens produced by the same speakers. Formant dynamics in conversational speech only contributed to speaker-specificity in the diphthong [ei], not in the monophthong [a?].
Adank. P., van Hout, R. and Van de Velde, H. (2007) An acoustic description of northern and southern standard Dutch II: Regional varieties. The Journal of the Acoustical Society of America 121(2): 1130–1141. DOI: https://doi.org/10.1121/1.2409492
Boersma, P. and Weenink, D. (2018) Praat. Doing phonetics by computer [Computer program]. Version 6.0.42.
Byrd, D. (1994) Relations of sex and dialect to reduction. Speech Communication 15(1-2): 39–54. DOI: https://doi.org/10.1016/0167-6393(94)90039-6
Fejlovà, D., Lukeš, D. and Skarnitzl, R. (2013) Formant contours in Czech vowels: Speaker-discriminating potential. Proceedings of Interspeech 2013 3182–3186, 25–29 August 2013, Lyon, France. DOI: https://doi.org/10.21437/Interspeech.2013-706
Gold, E. A. (2014) Calculating likelihood ratios for forensic speaker comparisons using phonetic and linguistic parameters. PhD dissertation, University of York, UK.
Gussenhoven, C. (1999) Dutch. In International Phonetic Association, and International Phonetic Association Staff (ed.) Handbook of the International Phonetic Association. A guide to the use of the International Phonetic Alphabet 74–77, Cambridge: Cambridge University Press.
Author (2018) Title.
Hughes, V., Wood, S. and Foulkes, P. (2016) Strength of forensic voice comparison evidence from the acoustics of filled pauses. The International Journal of Speech, Language and the Law 23(1): 99–132. DOI: https://doi.org/10.1558/ijsll.v23i1.29874
Hughes, V. (2017) Sample size and the multivariate kernel density likelihood ratio: How many speakers are enough? Speech Communication 94: 15–29. DOI: https://doi.org/10.1016/j.specom.2017.08.005
Ingram, J. C. L., Prandolini, R. and Ong, S. (1996) Formant trajectories as indices of phonetic variation for speaker identification. Forensic Linguistics 3(1): 129–145. DOI: https://doi.org/10.1558/ijsll.v3i1.129
Johnson, K., Ladefoged, P. and Lindau, M. (1993) Individual differences in vowel production. The Journal of the Acoustical Society of America 94(2): 701–714. DOI: https://doi.org/10.1121/1.406887
Jones, D. (1957) An outline of English phonetics. Cambridge: W. Heffer and Sons Ltd.
Künzel, H. J. (2001) Beware of the ‘telephone effect’: the influence of telephone transmission on the measurement of formant frequencies. Forensic Linguistics 8(1): 80–99. DOI: https://doi.org/10.1558/sll.2001.8.1.80
Aitken, C. G. G. and Lucy, D. (2004) Evaluation of trace evidence in the form of multivariate data. Applied Statistics 53: 109–122. DOI: https://doi.org/10.1046/j.0035-9254.2003.05271.x
McDougall, K. (2004) Speaker-specific formant dynamics: an experiment on Australian English /a?/. Speech, Language and the Law 11(1): 103–130. DOI: https://doi.org/10.1558/sll.2004.11.1.103
McDougall, K. (2006) Dynamic features of speech and the characterization of speakers: towards a new approach using formant frequencies. International Journal of Speech, Language and the Law 13(1): 89–126. DOI: https://doi.org/10.1558/sll.2006.13.1.89
McDougall, K. and Nolan, F. (2007) Discrimination of Speakers Using the Formant Dynamics of /u:/ in British English In J. Trouvain and W. Barry (eds) Proceedings of the 16th International Congress of Phonetic Sciences 1825–1828, 6–10 August 2007, Saarbrücken, Germany.
Moos, A. (2010) Long-term formant distributions as a measure of speaker characteristics in read and spontaneous speech. The Phonetician 101: 7–24.
Morrison, G. S., and Nearey, T. M. (2007) Testing theories of vowel inherent spectral change. Journal of the Acoustical Society of America 122: EL15–22. DOI: https://doi.org/10.1121/1.2739111
Morrison, G. S. (2007) Matlab implementation of Aitken & Lucy’s (2004) forensic likelihood-ratio software using multivariate-kernel-density estimation, Downloaded from https://geoff-morrison.net/#MVKD, last visited on 28-11-2019.
Morrison, G. S. (2008) Forensic voice comparison using likelihood ratios based on polynomial curves fitted to the formant trajectories of Australian English /a?/. The International Journal of Speech, Language and the Law 15(2): 249–266. DOI: https://doi.org/10.1558/ijsll.v15i2.249
Morrison, G. S. (2009a) Likelihood-ratio forensic voice comparison using parametric representations of the formant trajectories of diphthongs. Journal of the Acoustical Society of America 125(4): 2387–2397. DOI: https://doi.org/10.1121/1.3081384
Morrison, G. S. (2009b) train_llr_fusion_robust.m, Downloaded from https://geoff-morrison.net/#TrainFus, last visited on 28-11-2019.
Morrison, G. S., Zhang, C. and Rose P. (2011) An empirical estimate of the precision of likelihood ratios form a forensic-voice-comparison system. Forensic Science International 208: 59–65. DOI: https://doi.org/10.1016/j.forsciint.2010.11.001
Nearey, T. M. and Assman, P. F. (1986) Modeling the role of inherent spectral change in vowel identification. Journal of the Acoustical Society of America 80: 1297–1308. DOI: https://doi.org/10.1121/1.394433
Nolan, F., and Grigoras, C. (2005). A case for formant analysis in forensic speaker identification. Speech, Language and the Law 12(2): 143–173. DOI: https://doi.org/10.1558/sll.2005.12.2.143
Nolan, F., McDougall, K., de Jong, G. and Hudson, T. (2009) The DyViS database: style-controlled recordings of 100 homogeneous speakers for forensic phonetic research. The International Journal of Speech, Language and the Law 16(1): 31–57. DOI: https://doi.org/10.1558/ijsll.v16i1.31
Oostdijk, N. H. J. (2000) Het Corpus Gesproken Nederlands [The Spoken Dutch corpus]. Nederlandse Taalkunde 5: 280–284.
Peterson, G. E. and Barney, H. L. (1952) Control methods used in a study of the vowels. The Journal of the acoustical society of America 24(2): 175–184. DOI: https://doi.org/10.1121/1.1906875
Roach, P. (2004) British English. Received pronunciation. Journal of the International Phonetic Association 34(2): 239–245. DOI: https://doi.org/10.1017/S0025100304001768
Rose, P. (1999) Long- and short-term within-speaker differences in the formants of Australian hello. Journal of the International Phonetic Association 29(1): 1–31. DOI: https://doi.org/10.1017/S0025100300006393
Rose, P. (2015) Forensic voice comparison with monophthongal formant trajectories-a likelihood ratio-based discrimination of “schwa” vowel acoustics in a close social group of young Australian females. Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing 4819–4823. DOI: https://doi.org/10.1109/ICASSP.2015.7178886
Schindler, C. and Draxler, C. (2013) Using spectral moments as a speaker specific feature in nasals and fricatives. Proceedings of Interspeech 2793–2796, Lyon, France, 25–29 August 2013. DOI: https://doi.org/10.21437/Interspeech.2013-639
Thaitechawat, S. and Foulkes, P. (2011) Discrimination of speakers using tone and formant dynamics in Thai. Proceedings of ICPhS XVII 1978–1981, Hong Kong, 17–21 August 2011.
Van den Heuvel, H. (1996) Speaker variability in acoustic properties of Dutch phoneme realisations. PhD dissertation, Radboud University Nijmegen.
Van de Velde, H. (1996) Variatie en verandering in het gesproken Standaard-Nederlands. Nijmegen: Katholieke Universiteit Nijmegen
Van Leeuwen, D. A. (2008) SRE-tools, a software package for calculating performance metrics for NIST speaker recognition evaluations. Downloaded from http://sretools.googlepages.com/, last visited on 02-03-2020.
Voeten, C. C. (submitted) The adoption of sound change. Synchronic and diachronic processing of regional variation in Dutch. PhD dissertation, Leiden University
Weirich, M. and Simpson, A. P. (2018) Individual differences in acoustic and articulatory undershoot in a German diphthong – Variation between male and female speakers. Journal of Phonetics 71: 35–50. DOI: https://doi.org/10.1016/j.wocn.2018.07.007
Zuo, D. and Mok, P. P. K. (2015) Formant dynamics of bilingual identical twins. Journal of Phonetics 52: 1–12. DOI: https://doi.org/10.1016/j.wocn.2015.03.003
How to Cite
© Equinox Publishing Ltd.
For information regarding our Open Access policy, click here.