Forensic comparison of ageing voices from automatic and auditory perspectives

Authors

  • Finnian Kelly Center for Robust Speech Systems (CRSS), the University of Texas at Dallas
  • Naomi Harte Trinity College Dublin

DOI:

https://doi.org/10.1558/ijsll.v22i2.21760

Keywords:

non-contemporary speech, vocal ageing, automatic speaker identification

Abstract

Comparison of non-contemporary speech samples occurs frequently in forensic speaker-recognition cases. While the ageing-related changes in the voice have been well investigated, the effect of ‘vocal ageing’ on forensic speaker recognition has yet to be fully established. In this article, auditory and automatic experiments providing a deeper insight into the impact of ageing on forensic speaker recognition are presented. A listener test investigating the extent to which vocal ageing is detectable by lay listeners is first presented. A test set of 10 males and 10 females, with recordings spanning approximately 30 years per speaker, are taken from the Trinity College Dublin Speaker Ageing (TCDSA) database. Correct detection of ageing in two samples of the same speaker is found to increase from 64% at a 10-year age difference to 86% at a 30-year age difference. Ageing is significantly more detectable in female speakers than male speakers, and female listeners are significantly better at detecting ageing than male listeners. A link between ageing detectability and speaking fundamental frequency is also observed. A forensic automatic speaker recognition (FASR) experiment with ageing speakers is then presented. Given a test set of five male speakers from the TCDSA database, each with multiple recordings spanning 30–50 years, ageing is shown to progressively weaken the strength-of-evidence (likelihood ratios) of same-speaker comparisons. While there is inter-speaker variability in the extent of the ageing effect, instances of erroneous support for the different-speaker hypothesis are introduced for all speakers within a time-lapse of 10 years. The detrimental effect of ageing on the overall FASR system is also illustrated via Tippett plots.

Author Biographies

  • Finnian Kelly, Center for Robust Speech Systems (CRSS), the University of Texas at Dallas
    Finnian Kelly is a Research Associate at the Center for Robust Speech Systems (CRSS) at The University of Texas at Dallas. Prior to joining CRSS, he was with the Sigmedia Research group at Trinity College Dublin, Ireland, where he completed his PhD in 2013. His research interests include automatic speaker recognition in the presence of speaker variability and the application of speaker recognition to forensics.
  • Naomi Harte, Trinity College Dublin
    Naomi Harte is an Assistant Professor with the School of Engineering, Trinity College Dublin, Ireland, where she was appointed as an SFI Engineering Initiative Lecturer in digital media in 2008. Prior to returning to academia, she worked in high-tech start-ups in the field of DSP systems development. Her research interests include speech quality, audio-visual speech recognition, emotion in speech, speaker verification, and bird species analysis from song.

References

AGNITIO (2015) BATVOX Voice Biometrics Tool. Retrieved from www.agnitio-corp.com/products/government/voice-recognition-system


Aitken, C. G. G., and Taroni, F. (2004) Statistics and the Evaluation of Evidence for Forensic Scientists. London: John Wiley & Sons. http://dx.doi.org/10.1002/0470011238


American Presidential Speech Archive (2015) Retrieved from http://millercenter.org/president/speeches


Apted, M. (Writer) (1977–2012) The Up Series. In G. Television (Producer).


Beck, J. M. (2010) Organic variation of the vocal apparatus. In W. J. Hardcastle, J. Laver and F. E. Gibbon (eds) The Handbook of Phonetic Sciences 153–201. Oxford: Blackwell. http://dx.doi.org/10.1002/9781444317251.ch5


Boersma, P. (1993) Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. IFA Proceedings 17(1193): 97–110.


Brandschain, L., Graff, D., Cieri, C., Walker, K., Caruso, C. and Neely, A. (2010) Greybeard – voice and aging. Paper presented at the Seventh conference on International Language Resources and Evaluation (LREC ’10), Valletta, Malta.


Bruckl, M. and Sendlmeier, W. (2003) Aging female voices: An acoustic and perceptive analysis. Paper presented at the VOQUAL ’03, Geneva.


Champod, C. and Evett, E. W. (2000) Commentary on A. P. A. Broeders (1999) ‘Some observations on the use of probability scales in forensic identification’, Forensic Linguistics 6(2): 228–241. International Journal of Speech, Language and the Law 7(2): 239–243.


Cole, R., Noel, M. and Noel, V. (1998) The CSLU speaker recognition corpus. Paper presented at the International Conference on Spoken Language Processing.


Decoster, W. and Debruyne, F. (2000) Longitudinal voice changes: facts and interpretation. Journal of Voice 14(2): 184–193. http://dx.doi.org/10.1016/S0892-1997(00)80026-0


Drygajlo, A. (2007) Forensic automatic speaker recognition. Signal Processing Magazine, IEEE 24(2): 132–135. http://dx.doi.org/10.1109/MSP.2007.323278


Drygajlo, A. (2012) Automatic speaker recognition for forensic case assessment and interpretation. In A. Neustein and H. A. Patil (eds) Forensic Speaker Recognition 21–39. New York: Springer. http://dx.doi.org/10.1007/978-1-4614-0263-3_2


Endres, W., Bambach, W. and Flösser, G. (1971) Voice spectrograms as a function of age, voice disguise, and voice imitation. Journal of the Acoustical Society of America 49(6B): 1842–1848. http://dx.doi.org/10.1121/1.1912589


Eriksson, A. (2005) Tutorial on forensic speech science. Part 1. Forensic phonetics. Paper presented at the InterSpeech 2005, Lisbon, Portugal.


Evett, I. W., Jackson, G., Lambert, J. A. and McCrossan, S. (2000) The impact of the principles of evidence interpretation on the structure and content of statements. Science & Justice 40(4): 233–239. http://dx.doi.org/10.1016/S1355-0306(00)71993-9


French, J. P. F., Harrison, P. and Windsor-Lewis, J. (2006) R v John Samuel Humble: the Yorkshire Ripper hoaxer trial. International Journal of Speech, Language and the Law 13(2): 256–273.


Gold, E. and French, P. (2011) International practices in forensic speaker comparison. International Journal of Speech, Language and the Law 18(2): 293–307. http://dx.doi.org/10.1558/ijsll.v18i2.293


Gold, E. and Hughes, V. (2012) Issues and opportunities for the application of the numerical likelihood ratio framework to forensic speaker comparison. Paper presented at the IAFPA 2012, Tampa, Florida.


Gonzalez-Rodriguez, J., Drygajlo, A., Ramos-Castro, D., Garcia-Gomar, M. and Ortega-Garcia, J. (2006) Robust estimation, interpretation and assessment of likelihood ratios in forensic speaker recognition. Computer Speech & Language 20(2–3): 331–355.


Gonzalez-Rodriguez, J. and Ramos, D. (2007) Forensic automatic speaker classification in the ‘coming paradigm shift’. In C. Müller (ed.) Speaker Classification. I. Fundamentals, Features, and Methods 205–217. Berlin: Springer. http://dx.doi.org/10.1016/j.csl.2005.08.005


Gonzalez-Rodriguez, J., Rose, P., Ramos, D., Toledano, D. T. and Ortega-Garcia, J. (2007) Emulating DNA: Rigorous quantification of evidential weight in transparent and testable forensic speaker recognition. IEEE Transactions on Audio, Speech, and Language Processing 15(7): 2104-2115. http://dx.doi.org/10.1109/TASL.2007.902747


Greenberg, C., Stanford, V., Martin, A., Yadagiri, M., Doddington, G., Godfrey, J. and Hernandez-Cordero, J. (2013) The 2012 NIST speaker recognition evaluation. Paper presented at the InterSpeech 2013, Lyon, France.


Harnsberger, J. D., Brown Jr, W. S., Shrivastav, R. and Rothman, H. (2010) Noise and tremor in the perception of vocal aging in males. Journal of Voice 24(5): 523–530. http://dx.doi.org/10.1016/j.jvoice.2009.01.003


Harrington, J., Palethorpe, S. and Watson, C. (2007) Age-related changes in fundamental frequency and formants: a longitudinal study of four speakers. Paper presented at the InterSpeech 2007.


Hollien, H. and Schwartz, R. (2000) Aural-perceptual speaker identification: problems with noncontemporary samples. International Journal of Speech, Language and the Law 7(2): 199–211. http://dx.doi.org/10.1558/sll.2000.7.2.199


Hollien, H. and Schwartz, R. (2001) Speaker identification utilizing noncontemporary speech. Journal of Forensic Sciences 46(1): 63–67. http://dx.doi.org/10.1520/jfs14912j


Ishihara, S. and Kinoshita, Y. (2008) How many do we need? Exploration of the population size effect on the performance of forensic speaker classification. Paper presented at the Interspeech 2008.


Kelly, F., Drygajlo, A. and Harte, N. (2013) Speaker verification in score-ageing-quality classification space. Computer Speech & Language 27(5): 1068–1084. http://dx.doi.org/10.1016/j.csl.2012.12.005


Kelly, F. and Harte, N. (2013) Auditory detectability of vocal ageing and its effect on forensic automatic speaker recognition. Paper presented at the InterSpeech 2013, Lyon, France.


Kelly, F., Saeidi, R., Harte, N. and Leeuwen, D. v. (2014) Effect of long-term ageing on i-vector speaker verification. Paper presented at the InterSpeech 2014, Singapore.


Kinnunen, T. (2005) Optimizing Spectral Feature Based Text-Independent Speaker Recognition. PhD dissertation, University of Joensuu.


Kinnunen, T. and Li, H. (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Communication 52(1): 12–40. http://dx.doi.org/10.1016/j.specom.2009.08.009


Künzel, H. J. (2007) Non-contemporary speech samples: auditory detectability of an 11 year delay and its effect on automatic speaker identification. International Journal of Speech, Language and the Law 14(1): 109–136. http://dx.doi.org/10.1558/ijsll.v14i1.109


Lawson, A. D., Stauffer, A. R., Cupples, E. J., S. J., W., Bray, W. P. and Grieco, J. J. (2009) The Multi-Session Audio Research Project (MARP) corpus: goals, design and initial findings. Paper presented at the INTERSPEECH 2009, Brighton.


Leeuwen, D. A. v. and Brümmer, N. (2007) An introduction to application-independent evaluation of speaker recognition systems. Speaker Classification 1: 330–353. http://dx.doi.org/10.1007/978-3-540-74200-5_19


Linville, S. E. (2001) Vocal Aging. San Diego: Singular Thomson Learning.


Macmillan, N. and Creelman, D. (2004) Detection Theory: A User’s Guide. Mahwah, NJ: Lawrence Erlbaum.


Meuwly, D. and Drygajlo, A. (2001) Forensic speaker recognition based on a Bayesian framework and Gaussian Mixture Modelling (GMM). Paper presented at the Odyssey 2001, Crete, Greece.


Mueller, P. B. (1997) The Aging Voice. Seminars in Speech and Language 18(02): 159, 169. http://dx.doi.org/10.1055/s-2008-1064070


Nolan, F., McDougall, K., de Jong, G. and Hudson, T. (2009) The DyViS databse: stylecontrolled recordings of 100 homogenous speakers for forensic phonetic research. International Journal of Speech, Language and the Law 16(1): 31–57.


Reubold, U., Harrington, J. and Kleber, F. (2010) Vocal aging effects on F0 and the first formant: a longitudinal analysis in adult speakers. Speech Communication 52(7–8): 638–651. http://dx.doi.org/10.1016/j.specom.2010.02.012


Reynolds, D. A., Quatieri, T. F. and Dunn, R. B. (2000) Speaker verification using adapted Gaussian Mixture Models. Digital Signal Processing 10(1–3): 19–41. http://dx.doi.org/10.1006/dspr.1999.0361


Rhodes, R. (2011) Changes in the voice across the early adult lifespan. Paper presented at the The International Association of Forensic Phonetics and Acoustics (IAFPA) 2011, Vienna, Austria.


Rhodes, R. (2012) Assessing the Strength of Non-Contemporaneous Forensic Speech Evidence. PhD dissertation, The University of York.


Rose, P. (2002) Forensic Speaker Identification. New York: Taylor & Francis. http://dx.doi.org/10.1201/9780203166369


Rose, P. and Morrison, G. S. (2009) A response to the UK Position Statement on forensic speaker comparison. International Journal of Speech, Language and the Law 16(1): 139–163. http://dx.doi.org/10.1558/ijsll.v16i1.139


Saeidi, R., Lee, K. A., Kinnunen, T., Hasan, T., Fauve, B., Bousquet, P.-M., . . . Ambikairajah, E. (2013) I4U Submission to NIST SRE 2012: a large-scale collaborative effort for noise-robust speaker verification. Paper presented at the InterSpeech, Lyon, France.


Schötz, S. (2006) Perception, Analysis and Synthesis of Speaker Age. PhD dissertation, Lund University.


Stathopoulos, E. T., Huber, J. E. and Sussman, J. E. (2011) Changes in acoustic characteristics of the voice across the life span: measures from individuals 4–93 years of age. Journal of Speech, Language, and Hearing Research 54: 1011–1021. http://dx.doi.org/10.1044/1092-4388(2010/10-0036)


Torre III, P. and Barlow, J. A. (2009) Age-related changes in acoustic characteristics of adult speech. Journal of Communication Disorders 42(5): 324–333. http://dx.doi.org/10.1016/j.jcomdis.2009.03.001


Vipperla, R., Renals, S. and Frankel, J. (2010) Ageing voices: the effect of changes in voice parameters on ASR performance. EURASIP Journal on Audio, Speech, and Music Processing, 2010.

Published

2015-11-06

Issue

Section

Articles

How to Cite

Kelly, F., & Harte, N. (2015). Forensic comparison of ageing voices from automatic and auditory perspectives. International Journal of Speech, Language and the Law, 22(2), 167-202. https://doi.org/10.1558/ijsll.v22i2.21760