Aging effects on voice features used in forensic speaker comparison

Authors

  • Richard Rhodes J P French Associates & University of York

DOI:

https://doi.org/10.1558/ijsll.34096

Keywords:

FORENSIC SPEAKER COMPARISON, NON-CONTEMPORANEITY, AGING, VOWEL FORMANTS, ASR

Abstract

This article assesses the impact of long-term non-contemporaneity between recordings on the strength of forensic voice evidence and provides recommendations for casework. Analyses of longitudinal recordings from a British television documentary series illustrate the effects of aging on forensically useful acoustic parameters. Recordings from eight speakers span five seven-year intervals, between ages 21 and 49. The frequency of the first three formants of nine monophthongs and two diphthongs decrease over time. Strength of evidence estimates for non-contemporaneous comparisons are calculated from these data using a likelihood ratio approach. Longer delays predictably result in weaker and fewer correct LRs. The effect of aging on the performance of an ASR system (BATVOX) is tested; performance varies between speakers, but deteriorates considerably in longer delays for all speakers. Findings from this and similar studies should be considered when carrying out formant- or ASR-based comparisons across long delays, and in selecting age-appropriate reference data.

References

Association of Forensic Science Providers [AFSP; UK]. (2009). Standards for the formulation of evaluative forensic science expert opinion. Science & Justice, 49, 161-164.

Aitken, C. G. G., & Lucy, D. (2004). Evaluation of trace evidence in the form of multivariate data. Applied statistics 53 (1), 109-122.

Bowie, D. (2005). Language change over the lifespan: a test of the apparent time construct. University of Pennsylvania working papers in linguistics, 45-58.

Bowie, D. (2010). The ageing voice: changing identity over time. In C. Llamas & D. Watt (eds), Language and Identities (pp. 55-66). Edinburgh: EUP.

Broeders, A. P., Cambier-Langevald, T., & Vermuelen, J. (2002). Arranging a voice line-up in a foreign language. Forensic Linguistics 9 (1), 104-112.

Chambers. (1988). Acquisition of phonological variants. In A. R. Thomas (ed.), Methods in dialectology. Multilingual matters.

Chambers, J. K. (1992). Dialect acquisition. Language 68, 673-705.

Champod, C., & Evett, I. W. (2000). Commentary on: Broeders, A.P.A. (1999) some observations on the use of probability scales in forensic identification. Forensic Linguistics 6 (2), 228-241.

Endres, W., Bambach, W., & Flösser, G. (1971). Voice spectrograms as a function of age, voice disguise, and voice imitation. Journal of the acoustical society of America 49 (4), 1842-1848.

Enzinger, E. (2010). Parametric representations of diphthongal formant trajectories of Viennese German /ae/. Presentation delivered to the 19th annual conference of the international association for forensic phonetics and acoustics, 18-21 July. Trier.

Eriksson, E. J., & Sullivan, K. P. (2008). An investigation into the effectiveness of a Swedish glide + vowel segment for speaker discrimination. International journal of speech, language and the law 15 (1), 51-66.

French, J. P. (1998). Mr Akbar's nearest ear versus the Lombard reflex: a case study in forensic phonetics. Forensic Linguistics 5 (1), 58-68.

French, J. P., Harrison, P., & Windsor-Lewis, J. (2006). R v John Samuel Humble: The Yorkshire Ripper Hoaxer trial. International journal of speech, language and the law 13 (2), 255-273.

Gold, E., & French, J. P. (2011). International practices in forensic speaker comparison. International journal of speech, language and the law, 18 (2), 293-307.

Griesbach, R., Esser, O., & Weinstock, C. (1995). Speaker identification by formant contours. In A. Braun, & O. Köster (eds.), Studies in forensic phonetics (pp. 49-55). Trier: Wissenschatlicher Verlag.

Harrington, J., Palethorpe, S., & Watson, C. J. (2000a). Monophthongal changes in received pronunciation: an acoustic analysis of the Queen's Christmas broadcasts. Journal of the International Phonetic Association 30, 63-78.

Harrington, J., Palethorpe, S., & Watson, C. J. (2000b). Does the Queen speak the Queen's English? Nature 408, 927.
Hughes, V. (2009). Diphthong dynamics in unscripted speech. Presentation at the 19th annual conference of the international association for forensic phonetics and acoustics. Cambridge.

Ingram, J. C., Prandolini, R., & Ong, S. (1996). Formant trajectories as indices of phonetic variation for speaker identification. Forensic linguistics 3 (1), 129-145.

Kahane, J. (1980). Anatomic and physiologic changes in the aging peripheral speech mechanism. In D. Beasley, & G. Davis (eds.), Aging: communication processes and disorders. New York: Grune and Stratton.

Kelly, F., & Hansen, J. H. (2015). The effect of short-term vocal aging on automatic speaker recognition performance. Presentation delivered to the 24th annual conference of the international association for forensic phonetics and acoustics. Leiden.

Kelly, F., & Hansen, J. H. (2016). Score-Aging Calibration for Speaker Verification. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(12), 2414-2424.

Kelly, F., & Harte, N. (2011). Effects of long-term ageing on speaker verification. Biometrics and ID Management, vol. 6583 of Lecture notes on Computer Science, 113-124.

Kelly, F., & Harte, N. (2015). Forensic comparison of ageing voices from automatic and auditory perspectives. International Journal of Speech Language and the Law, 22(2), 167-202.

Kelly, F., Brummer, N., & Harte, N. (2013). Eigen-ageing Compensation for Speaker Verification. In INTERSPEECH, (pp. pp. 1624-1628). Lyon.

Kelly, F., Drygajlo, A., & Harte, N. (2012). Speaker verification with long-term ageing data. Proceedings of the 5th IAPR International conference on Biometrics. New Delhi March 29-April 1.

Künzel, H. J. (2001). Beware of the ‘telephone effect’: the influence of telephone transmission on the measurement of formant frequencies. Forensic Linguistics, 8 (1), 80-99.

Künzel, H. J. (2007). Non-contemporary speech samples: auditory detectability of an 11-year delay and its effects on automatic speaker identification. International journal of speech, language and the law 14 (1), 109-136.

Laver, J., & Trudgill, P. (1979). Phonetic and linguistic markers in speech. In K. Scherer, & H. Giles (eds.), Social markers in speech (pp. 1-26). Cambridge: CUP.

Linville, S. E. (2001). Vocal aging. Canada: Singular.

Linville, S. E., & Fisher, H. (1985). Acoustic characteristics of perceived versus actual vocal age in controlled phonation by adult males. Journal of the acoustical society of America 78, 40-48.

Linville, S. E., & Rens, J. (2001). Vocal tract resonance analysis of aging voice using long-term average spectra. Journal of voice 15 (3), 323-330.

McDougall, K. (2005). The role of formant dynamics in determining speaker identity. PhD Thesis: University of Cambridge.

McDougall, K. (2006). Dynamic features of speech and the characterisation of speakers: towards a new approach using formant frequencies. International journal of speech, language and the law 13 (1), 89-126.

McDougall, K., & Nolan, F. (2007). Discrimination of speakers using the formant dynamics of /u:/ in British English. Proceedings of the 16th International congress of phonetic sciences, 6-10 August (pp. 1825-1828). Saarbrücken: J. Trouvain & W. Barry (eds.).

Milroy, L., & Milroy, J. (1992). Social network and social class: toward an integrated sociolinguistic model. Language in society 21 (1), 1-26.

Morrison, G. S. (2008). Forensic voice comparison using likelihood ratios based on polynomial curves fitted to the formant trajectories of Australian English /a?/. International journal of speech, language and the law, 249-266.

Morrison, G. S., Sahito, F. H., Jardine, G., Djokic, D., Clavet, S., Berghs, S., et al. (2015). Interpol survey of the use of speaker identification by law enforcement agencies. Poster presentation at the 24th annual conference of the international association for forensic phonetics and acoustics. Leiden.

Nolan, F. (2003). A recent voice parade. Forensic Linguistics 10 (2), 277-291.

Nolan, F., McDougall, K., de Jong, G., & Hudson, T. (2009). The DyViS database: style-controlled recordings of 100 homogenous speakers for forensic phonetic research. International journal of speech, language and the law, 16(1), 31-57.

Nycz, J. R. (2011). Second dialect acquisition: implications for theories of phonological representation. PhD Thesis, New York University.

Ratstatter, M., & Jacques, R. (1990). Formant frequency structure of the aging male and female vocal tract. Folia Phoniatr (Basel) 42, 312-319.

Ratstatter, M., McGuire, R., Kalinowski, J., & Stuart, A. (1997). Formant frequency characteristics of elderly speakers in contextual speech. Folia phoniatrica et lodopaedica 49, 1-8.

Reubold, U., Harrington, J., & Kleber, F. (2010). Vocal aging effects on F0 and the first formant: a longitudinal analysis in adult speakers. Speech
Communication 52 (7-8), 638-651.

Reuter, P., & Stevens, A. (2007). An analysis of UK drug policy. UK Drug Policy Commission.
[Author]. (2012). Assessing the strength of non-contemporaneous forensic speech evidence. PhD Thesis [available at http://etheses.whiterose.ac.uk/3935/].

Rose, P. (2002). Forensic speaker identification. London: Taylor-Francis Ltd.

Suzuki, T., Tanimoto, M., Osanai, T., & Kido, H. (1996). Acoustic variation of voice with aging of male speakers on vowels and nasal sounds. Proceedings of the 1996 annual meeting of the American academy of forensic sciences, (p. 85). Nashville.

Tagliamonte, S. A., & Molfenter, S. (2007). How'd you get that accent? Acquiring a second dialect of the same language. Language in society 36, 649-675.

Wilder, C. (1978). Vocal aging. In Weinberg, B. (ed.) Transcripts of the seventh symposium: care of the professional voice. Part II: life span changes in the human voice. New York: Voice Foundation.

Wind, J. (1970). On the phylogeny and ontogeny of the human larynx. Groningen: Wolters-Noordhoff.

Xue, S. A., & Hao, G. J. (2003). Changes in the human vocal tract due to aging and the acoustic correlates of speech production: a pilot study. Journal of speech and hearing research 46, 689-701.

Yilmaz, E., Dijkstra, J., Van de Velde, H., Kampstra, F., Algra, J., & van den Heuvel, H.. (2017). Longitudinal Speaker Clustering and Verification Corpus with Code-Switching Frisian-Dutch Speech. INTERSPEECH (forthcoming). Stockholm.

Published

2017-12-20

Issue

Section

Articles

How to Cite

Rhodes, R. (2017). Aging effects on voice features used in forensic speaker comparison. International Journal of Speech, Language and the Law, 24(2), 177-199. https://doi.org/10.1558/ijsll.34096