Assessing the effects of accent-mismatched reference population databases on the performance of an automatic speaker recognition system

Authors

  • Dominic Watt University of York
  • Philip Harrison University of York
  • Vincent Hughes University of York
  • Peter French University of York
  • Carmen Llamas University of York
  • Almut Braun Bundeskriminalamt (BKA)
  • Duncan Robertson University of York

DOI:

https://doi.org/10.1558/ijsll.41466

Keywords:

Forensic phonetics, automatic speaker recognition, speech technology, forensic speaker comparison

Abstract

Automatic Speaker Recognition (ASR) systems are designed to provide the user with statistics relating to the similarity of two or more speech samples and to the typicality of those shared features in the wider population. When an ASR system is used as part of a forensic investigation, the user must decide what counts as the appropriate ‘wider population’ and select a reference database accordingly. While it has generally been held that the voices populating the reference database should be similar in accent to that of the samples under consideration, the degree to which the accents should correspond has until now not been investigated empirically. We report in this article on a study in which the composition of the reference database was systematically varied in terms of accent, using corpora of samples of Standard Southern British English and of three subvarieties spoken in North-East England (Newcastle, Sunderland, Middlesbrough).

Author Biographies

  • Dominic Watt, University of York

    Dominic Watt is Senior Lecturer in Forensic Speech Science at the University of York, UK. His research interests include forensic linguistics and phonetics, speech perception, sociophonetics, dialectology, and language and identity studies. He was Co-Investigator on the UK Economic and Social Research Council-funded projects  'The  Use  and  Utility  of  Localised  Speech  Forms  in  Determining  Identity:  Forensic and Sociophonetic Perspectives' (2016–19, ES/M010783/1) and 'Accent Bias and Fair Access in Britain' (2017–20, ES/P007767/1). He is co-editor of The Handbook of Dialectology (Wiley, 2018) and, with Carmen Llamas, Language and Identities (Edinburgh University Press, 2010). He undertakes occasional forensic casework on behalf of JP French Associates, York.

  • Philip Harrison, University of York

    Philip Harrison is a forensic consultant and company director, specialising in the areas of acoustics, phonetics and the analysis of evidential recordings. He has worked at J P French Associates since 1997 and has expertise in authentication, enhancement, transcription and speaker comparison. He is also active in carrying out research in the fields of forensic speech and audio analysis: in the efficacy of acoustic analysis software, measuring the performance of biometric systems and evaluating methods for expressing the strength of forensic speech evidence. He is playing a key role in developing quality standards in the field in conjunction with the UK Forensic Science Regulator, and teaches technical aspects of forensic speech science at the University of York.

  • Vincent Hughes, University of York

    Vincent Hughes is Lecturer in Forensic Speech Science at the University of York, UK. His research interests lie in forensic speech science, phonetics, phonology, sociophonetics and sociolinguistics. His current research focuses on understanding the bases and limitations of individual speaker characterisation and the relative contribution of acoustic, auditory and biological information. He is also interested in the application of the numerical likelihood ratio framework to the evaluation of speech evidence in forensic voice comparison cases. His doctoral research considered how the definition of the relevant population with regard to regional and social dimensions of variability and sample size affects the numerical estimation of the strength of evidence.

  • Peter French, University of York

    Peter French is Professor of Forensic Speech Science in the Department of Language and Linguistic Science at the University of York, UK, Visiting Professor of the same subject in the Department of Modern Languages and Linguistics at the University of Huddersfield, UK, Company Chairman of J P French Associates Forensic Speech and Acoustics Laboratory, and President of the International Association for Forensic Phonetics and Acoustics.

  • Carmen Llamas, University of York

    Carmen Llamas is Senior Lecturer in Sociolinguistics at the University of York, UK. She is Principal Investigator on ‘The Use and Utility of Localised Speech Forms in Determining Identity: Forensic and Sociophonetic Perspectives’ (TUULS) project. She is co-author, with Joan Beal and Lourdes Burbano-Elizondo, of Urban North-Eastern English: Tyneside to Teesside (Edinburgh University Press, 2012), and co-editor (with Dominic Watt) of Language and Identities and Language, Borders and Identity (Edinburgh University Press 2010, 2014). She is also co-editor of The Routledge Companion to Sociolinguistics, with Louise Mullany and Peter Stockwell (Routledge, 2007).

  • Almut Braun, Bundeskriminalamt (BKA)

    Almut Braun was Postdoctoral Research Associate on the TUULS project at the University of York, UK, between 2016 and 2019. In November 2015, she completed her doctoral research entitled ‘The speaker identification ability of blind and sighted listeners – an empirical investigation’ (Philipps-Universität Marburg, Germany). She holds an MA in German Philology, Phonetics and Linguistic Engineering. Since March 2020 she has been employed as a forensic speech and audio analyst by the Federal Criminal Police Office (Bundeskriminalamt, BKA) in Wiesbaden, Germany.

  • Duncan Robertson, University of York

    Duncan Robertson was Postdoctoral Research Associate on the TUULS project at the University of York, UK, between 2016 and 2019. He completed his doctoral research at the University of Glasgow in 2015, investigating implicit and explicit associations towards different social accents of English. He is now employed by the UK's Office of Qualifications and Examinations Regulation (Ofqual) as a data analyst.

References

Aitken, C. G. G. and Taroni, F. (2004) Statistics and the Evaluation of Evidence for Forensic Scientists (2nd ed.). Hoboken, NJ: John Wiley and Sons.

Association of Forensic Science Providers (2009) Standards for the formulation of evaluative forensic science expert opinion. Science and Justice 49: 161--164.

Beal, J., Burbano-Elizondo, L. and Llamas, C. (2012) Urban North-Eastern English: Tyneside to Teesside. Edinburgh: Edinburgh University Press.

Braun, A., Llamas, C., Watt, D., French, P. and Robertson, D. (2018) Sub-regional ‘other-accent’ effects on lay listeners’ speaker identification abilities: a voice line-up study with speakers and listeners from the North East of England. International Journal of Speech, Language and the Law 25(2): 231--255.

Caballero, M., Mariño, J.B. and Moreno, A. (2002) Multidialectal Spanish modeling for ASR. Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC’02), Las Palmas, Spain, May 2002, 892--895.

Champod, C. and Evett, I. (2000) Commentary on Broeders 1999. Forensic Linguistics 7(2): 238--243.

Davis, S. B. and Mermelstein, P. (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing 28(4): 357--366.

Dellwo, V., French, P. and He, L. (2018) Voice biometrics for speaker recognition applications. In S. Frühholz and P. Belin (eds.) The Oxford Handbook of Voice Perception 777--795. Oxford: Oxford University Press.

Enzinger, E., Morrison, G. S. and Ochoa, F. (2016) A demonstration of the application of the new paradigm for the evaluation of forensic evidence under conditions reflecting those of a real forensic-voice-comparison case. Science and Justice 56: 42--57.

Enzinger, E. and Morrison, G. S. (2017) Empirical test of the performance of an acoustic-phonetic approach to forensic voice comparison under conditions similar to those of a real case. Forensic Science International 277: 30--40.

Evett, I., Lambert, J. and Buckleton, J. (1995) Further observations on glass evidence interpretation. Science and Justice 35(4): 283--289.

French, P. (2017) A developmental history of forensic speaker comparison in the UK. English Phonetics 21: 271--286.

French, P. and Stevens, L. (2013) Forensic speech science. In M. Jones and R. Knight (eds.) The Bloomsbury Companion to Phonetics 183--197. London: Continuum.

Gold, E. and French, P. (2019) International practices in forensic speaker comparisons: second survey. International Journal of Speech, Language and the Law 26(1): 1--20.

Hansen, J. H. L. and Hasan, T. (2015) Speaker recognition by machines and humans: A tutorial review. IEEE Signal Processing Magazine 32(6): 74--99.

Hudson, T., de Jong, G., McDougall, K., Harrison, P. and Nolan, F. (2007) F0 statistics for 100 young male speakers of Standard Southern British English. Proceedings of the 16th International Congress of Phonetic Sciences, Saarbrücken, August 2007: 1809--1812.

Hughes, V. (2014) The Definition of the Relevant Population and the Collection of Data for Likelihood Ratio-Based Forensic Voice Comparison. PhD Thesis. York: University of York. http://etheses.whiterose.ac.uk/8309/1/Hughes, V. (2014) PhD.pdf

Hughes, V. and Foulkes, P. (2015) The relevant population in forensic voice comparison: effects of varying delimitations of social class and age. Speech Communication 66: 218--230.

Hughes, V., Harrison, P., Foulkes, P., French, P., Kavanagh, C. and San Segundo, E. (2018) The individual and the system: assessing the stability of the output of a semi-automatic forensic voice comparison system. Proceedings of Interspeech 2018, Hyderabad, India: 227--231.

Hughes, V. and Rhodes, R. (2018) Questions, propositions and assessing different levels of evidence: forensic voice comparison in practice. Science and Justice 58(4): 250--257.

Jessen, M., Meir, G. and Solewicz, Y. A. (2019) Evaluation of Nuance Forensics 9.2 and 11.1 under conditions reflecting those of a real forensic voice comparison case (forensic_eval_01). Speech Communication 110: 101--107.

Kinoshita, Y. and Ishihara, S. (2014) Background population: how does it affect LR based forensic voice comparison? International Journal of Speech, Language and the Law 21(2): 191--224.

Künzel, H. J. (2013) Automatic speaker recognition with cross-language speech material. International Journal of Speech, Language and the Law 20(1): 21--44.

Meuwly, D. (2001) Reconnaissance de locuteurs en sciences forensiques: l’apport d’une approche automatique. PhD thesis, University of Lausanne. Retrieved on 25 March 2020 from https://serval.unil.ch/resource/serval:BIB_R_7892.P001/REF

Morrison, G. S. (2012) The likelihood-ratio framework and forensic evidence in court: a response to R v T. International Journal of Evidence and Proof 16: 1--29.

Morrison, G. S. (2018) The impact in forensic voice comparison of lack of calibration and of mismatched conditions between the known-speaker recording and the relevant-population sample recordings. Forensic Science International 283: e1--e7.

Nolan, F., McDougall, K., de Jong, G. and Hudson, T. (2009) The DyViS database: style-controlled recordings of 100 homogeneous speakers for forensic phonetic research. International Journal of Speech, Language and the Law 16(1): 31--57.

Rose, P. (2013). Where the science ends and the law begins: likelihood ratio-based forensic voice comparison in a $150 million telephone fraud. International Journal of Speech, Language and the Law 20(2): 277--324.

San Segundo, E., Foulkes, P., French, P., Harrison, P., Hughes, V. and Kavanagh, C. (2019) The use of vocal profile analysis for speaker characterization: methodological proposals. Journal of the International Phonetic Association 49(3): 353--380.

Smith, R. L. and Charrow, R. P. (1975) Upper and lower bounds for the probability of guilt based on circumstantial evidence. Journal of the American Statistical Association 70: 555--560.

Snyder, D., Garcia-Romero, D., McCree, A., Sell, G., Povey, D. and Khudanpur, S. (2018) Spoken language recognition using X-vectors. Proceedings of Odyssey 2018: The Speech and Language Recognition Workshop, Les Sables d’Olonne, France, June 2018: 105--111.

Sohn, J., Kim, N. S. and Sung, W. (1999) A statistical model-based voice activity detection. IEEE Signal Processing Letters 6(1): 1--3.

Solewicz, Y. A., Jessen, M. and van der Vloed, D. (2017) Null-hypothesis LLR: a proposal for forensic automatic speaker recognition. Proceedings of Interspeech 2017, Stockholm, August 2017, 2849--2853. doi: 10.21437/Interspeech.2017-1023

Tippett, C., Emerson, V., Fereday, M., Lawton, F. and Lampert, S. (1968) The evidential value of the comparison of paint flakes from sources other than vehicles. Journal of the Forensic Science Society 8: 61--65.

van der Vloed, D., Jessen, M. and Gfroerer, S. (2017) Experiments with two forensic automatic speaker comparison systems using reference populations that (mis)match the test language. Proceedings of the Audio Engineering Society International Conference on Audio Forensics, Arlington, VA, June 2017. Retrieved on 25 March 2020 from http://www.aes.org/e-lib/browse.cfm?elib=18743

Van Leeuwen, D.A. and Bouten, J.S. (2004) Results of the 2003 NFI-TNO forensic speaker recognition evaluation. Proceedings of the Odyssey 2004 Speaker and Language Recognition Workshop, International Speech Communication Association: 75--82.

Wells, J. C. (1982) Accents of English 1: An Introduction. Cambridge: Cambridge University Press.

Wormald, J. (2016) Regional Variation in Panjabi-English. PhD Thesis, University of York. Retrieved on 25 March 2020 from http://etheses.whiterose.ac.uk/13188/1/Wormald_PhD_final.pdf

Published

2020-08-27

Issue

Section

Articles

How to Cite

Watt, D., Harrison, P., Hughes, V., French, P., Llamas, C., Braun, A., & Robertson, D. (2020). Assessing the effects of accent-mismatched reference population databases on the performance of an automatic speaker recognition system. International Journal of Speech, Language and the Law, 27(1), 1-34. https://doi.org/10.1558/ijsll.41466