Vowel convergence does not affect auditory speaker discriminability in humans and machine in a case study on Swiss German dialects
DOI:
https://doi.org/10.1558/ijsll.19954Keywords:
phonetic convergence, vowel acoustics, speaker discrimination, automatic speaker verification, Swiss German dialectsAbstract
In this study, we examined whether the convergence in interlocutors’ vowel acoustics leads to decreasing discriminability between interlocutors’ voices. Ten pairs of Grison and Zürich German speakers produced lexical items before and after dialogue interactions with evidence of vowel convergence in post-dialogue productions. In Experiment 1, native and non-native Swiss German listeners discriminated pairs of speakers whose speech was obtained pre- and post-dialogue. Results showed that listeners’ sensitivity (A’) was higher for native than non-native listeners, but comparable for pre- and post-dialogue recordings. The observed negative correlation between voice discrimination and the acoustic distance in formant space was mainly driven by a single speaker pair. In Experiment 2, the speaker recognition performance of an i-vector-based software was compared in pre- and post-dialogue speech. Results revealed no difference in the system performance between the two conditions. The findings suggest that vowel convergence does not compromise voice discriminability under the given experimental conditions.
References
Adank, P., Smits, R. and van Hout, R. (2004) A comparison of vowel normalization procedures for language variation research. Journal of the Acoustical Society of America 116(5): 3099–3107. https://doi.org/10.1121/1.1795335
Ajili, M. (2017) Reliability of voice comparison for forensic applications. Artificial Intelligence [cs.AI]. Université d’Avignon, 2017. English. ffNNT: 2017AVIG0223ff. fftel-01774394
Alexander, A., Forth, O., Atreya, A. A. and Kelly, F. (2016) VOCALISE: A forensic automatic speaker recognition system supporting spectral, phonetic, and user-provided features. In Proceedings Odyssey. http://www.odyssey2016.org/papers/Show_tell/88.pdf
Babel, M. (2010) Dialect divergence and convergence in New Zealand English. Language in Society 39(4): 437–456. https://doi.org/10.1017/s0047404510000400
Babel, M. and Bulatov, D. (2012) The role of fundamental frequency in phonetic accommodation. Language and Speech 55(2): 231–248. https://doi.org/10.1177/0023830911417695
Babel, M., McAuliffe, M. and Haber, G. (2013) Can mergers-in-progress be unmerged in speech accommodation? Frontiers in Psychology 4: 653. https://www.frontiersin.org/article/10.3389/fpsyg.2013.00653
Baumann, O. and Belin, P. (2010) Perceptual scaling of voice identity: Common dimensions for different vowels and speakers. Psychological Research 74(1): 110–120. https://doi.org/10.1007/s00426-008-0185-z
Boersma, P. and Weenink, D. (2018) Praat: Doing Phonetics by Computer [Computer program]. Version (6.0.37). Retrieved 14 March 2018 from http://www.praat.org/
Bonastre, J. F., Kahn, J., Rossato, S. and Ajili, M. (2015) Forensic speaker recognition: Mirages and reality. In S. Fuchs, D. Pape, C. Petrone and P. Perrier (eds) Individual Differences in Speech Production and Perception 255–284. Frankfurt am Main, Berlin: Peter Lang:.
Braun, A., Llamas, C., Watt, D., French, P. and Robertson D. (2018) Sub-regional ‘other-accent’ effects on lay listeners’ speaker identification abilities: A voice line-up study with speakers and listeners from the North East of England. International Journal of Speech Language and the Law 25(2): 231–255. https://doi.org/10.1558/ijsll.37340
Bricker, P. D. and Pruzansky, S. (1966) Effects of stimulus content and duration on talker identification. Journal of the Acoustical Society of America 40(6): 1441–1449. https://doi.org/10.1121/1.1910246
Chandrasekaran, B., Chan, A. H. D. and Wong, P. C. M. (2011) Neural processing of what and who information in speech. Journal of Cognitive Neuroscience 23(10): 2690–2700. https://doi.org/10.1162/jocn.2011.21631
Cohen Priva, U. and Sanker, C. (2018) Distinct behaviors in convergence across measures. In Proceedings of the 40th Annual Conference of the Cognitive Science Society 1518–1523, Austin, TX: Cognitive Science Society.
Cook, S. and Wilding, J. (1997) Earwitness testimony: Never mind the variety, hear the length. Applied Cognitive Psychology 11(2): 95–111. https://doi.org/10.1002/(SICI)1099-0720(199704)11:2<95::AID-ACP429>3.0.CO;2-O
Creel, S. C. and Bregman, M. R. (2011) How talker identity relates to language processing. Linguistics and Language Compass 5(5): 190–204. https://doi.org/10.1111/j.1749-818X.2011.00276.x
Davis, S. and Mermelstein, P. (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing 28(4): 357–366. https://doi.org/10.1109/TASSP.1980.1163420
Dehak, N., Kenny, P. J., Dehak, R. Dumouchel, P. and Ouellet, P. (2011) Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing 19(4): 788–798. https://doi.org/10.1109/TASL.2010.2064307
Dellwo, V., French, P. and He, L. (2019) Voice biometrics for FORENSIC speaker recognition applications. In S. Frühholz, and P. Belin (eds) The Oxford Handbook of Voice Perception, Oxford Library of Psychology (2018; online edn, Oxford Academic, 4 Oct. 2019). Oxford: Oxford University Press. https://doi.org/10.1093/oxfordhb/9780198743187.013.36
Du, M. (2017) Analysis of errors in forensic science. Journal of Forensic Science and Medicine 3: 139–143.
Farrús, M., Wagner, M., Erro, D. and Hernando, J. (2010) Automatic speaker recognition as a measurement of voice imitation and conversion. Journal of Speech, Language and the Law 17(1): 119–142. https://doi.org/10.1558/ijsll.v17i1.119
Fernández Gallardo, L. (2016) Human and Automatic Speaker Recognition over Telecommunication Channels. Singapore: Springer.
Fleischer, J. and Schmid, S. (2006) Zurich German. Journal of the International Phonetic Association 36(2): 243–253. https://doi.org/10.1017/S0025100306002441
Francis, A. L. and Driscoll, C. (2006) Training to use voice onset time as a cue to talker identification induces a left-ear/right-hemisphere processing advantage. Brain and Language 98(3): 310–318. https://doi.org/10.1016/j.bandl.2006.06.002
Furui, S. (1981) Cepstral analysis technique for automatic speaker verification. IEEE Transactions on Acoustics, Speech and Signal Processing 29(2): 254–272. https://doi.org/10.1109/TASSP.1981.1163530
Ganugapati, D. and Theodore, R. M. (2019) Structured phonetic variation facilitates talker identification. Journal of the Acoustical Society of America 145: EL469. https://doi.org/10.1121/1.5100166
Grier, J. B. (1971) Nonparametric indexes for sensitivity and bias: Computing formulas. Psychological Bulletin 75(6): 424–429. https://doi.org/10.1037/h0031246
Hudson, T., de Jong, G., McDougall, K., Harrison, P. and Nolan, F. (2007) F0 statistics for 100 young male speakers of Standard Southern British English. In Proceedings of the 16th International Congress of Phonetic Sciences 1809–1812. Saarbrücken.
JASP Team (2022) JASP (Version 0.16.3)[computer software].
Kelly, F., Fröhlich, A., Dellwo, V., Forth, O., Kent, S. and Alexander, A. (2019) Evaluation of VOCALISE under conditions reflecting those of a real forensic voice comparison case (forensic_eval_01). Speech Communication 112: 30–36. https://doi.org/10.1016/j.specom.2019.06.005
Kinnunen, T. and Li, H. (2010) An overview of text-independent speaker recognition: From features to supervectors. Speech Communication 52(1): 12–40. https://doi.org/10.1016/j.specom.2009.08.009
Knösche, T. R., Lattner, S., Maess, B., Schauer, M. and Friederici, A. D. (2002) Early parallel processing of auditory word and voice information. NeuroImage 17(3): 1493–1503. https://doi.org/10.1006/nimg.2002.1262
Kreiman, J. and Sidtis, D. (2011) Foundations of Voice Studies: An Interdisciplinary Approach to Voice Production and Perception. Hoboken, NJ: John Wiley & Sons.
Kreiman, J., Lee, Y., Garellek, M., Samlan, R. and Gerratt, B. R. (2021) Validating a psychoacoustic model of voice quality. Journal of the Acoustical Society of America 149(1): 457. https://doi.org/10.1121/10.0003331
Leemann, A., Dellwo, V., Kolly, M. J. and Schmid, S. (2012) Rhythmic variability in Swiss German dialects. In Proceedings of the 6th International Conference on Speech Prosody, May 22–25, Shanghai, China 607–610.
Legge, G. E., Grosmann, C. and Pieper, C. M. (1984) Learning unfamiliar voices. Journal of Experimental Psychology: Learning, Memory, and Cognition 10(2): 298–303. https://doi.org/10.1037/0278-7393.10.2.298
Lindh, J. (2006) Preliminary descriptive F0-statistics for young male speakers. Lund University Working Papers 52: 89–92.
Lobanov, B. M. (1971) Classification of Russian vowels spoken by different speakers. Journal of the Acoustical Society of America 49: 606–608. https://doi.org/10.1121/1.1912396
Loporcaro, M. and Bertinetto, P. M. (2005) The sound pattern of Standard Italian, as compared with the varieties spoken in Florence, Milan and Rome. Journal of the International Phonetic Association 35(2): 132–151. https://doi.org/10.1017/S0025100305002148
Majidi, M. R. and Ternes, E. (1999) Persian (Farsi). In Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet 124–125. Cambridge: Cambridge University Press.
Meuwly, D. (2000) Voice analysis. In Encyclopaedia of Forensic Sciences 1413–1421. Amsterdam: Elsevier.
Myers, E. B. and Theodore, R. M. (2017) Voice-sensitive brain networks encode talker-specific phonetic detail. Brain and Language 165: 33–44. https://doi.org/https://doi.org/10.1016/j.bandl.2016.11.001
Nygaard, L. C. (2005) Perceptual integration of linguistic and nonlinguistic properties of speech. In D. B. Pisoni and R. E. Remez (eds) The Handbook of Speech Perception 390–413. Malden. MA, and Oxford: Blackwell Publishing.
Nygaard, L. C. and Pisoni, D. B. (1998) Talker-specific learning in speech perception. Perception and Psychophysics 60(3) : 355–376. https://doi.org/10.3758/BF03206860
Pardo, J. S. (2006).On phonetic convergence during conversational interaction. Journal of the Acoustical Society of America 119(4): 2382–2393. https://doi.org/10.1121/1.2178720
Pardo, J. S., Jordan, K., Mallari, R., Scanlon, C. and Lewandowski, E. (2013) Phonetic convergence in shadowed speech: The relation between acoustic and perceptual measures. Journal of Memory and Language 69(3) 183–195. https://doi.org/10.1016/j.jml.2013.06.002
Pardo, J. S., Urmanche, A., Wilman, S. and Wiener, J. (2017) Phonetic convergence across multiple measures and model talkers. Attention, Perception, and Psychophysics 79(2): 637–659. https://doi.org/10.3758/s13414-016-1226-0
Pardo, J. S., Urmanche, A., Wilman, S., Wiener, J., Mason, N., Francis, K. and Ward, W. (2018) A comparison of phonetic convergence in conversational interaction and speech shadowing. Journal of Phonetics 69: 1–11. https://doi.org/10.1016/j.wocn.2018.04.001
Perrachione, T. K. (2018) Speaker recognition across languages. In S. Frühholz and P. Belin (eds) The Oxford Handbook of Voice Perception. Oxford: Oxford University Press. https://open.bu.edu/handle/2144/23877
Perrachione, T. K., Furbeck, K. T. and Thurston, E. J. (2019) Acoustic and linguistic factors affecting perceptual similarity judgments of voices. Journal of the Acoustical Society of America 146: 3384–3399. https://doi.org/10.1121/1.5126697
Pollack, I., Pickett, J. M. and Sumby, W. H. (1954) On the identification of speakers by voice. Journal of the Acoustical Society of America 26(3): 403–406. https://doi.org/10.1121/1.1907349
RStudio Team (2022) RStudio: Integrated Development Environment for R. Boston, MA: RStudio, PBC. http://www.rstudio.com/
Reader, A. T. and Holmes, N. P. (2016) Examining ecological validity in social interaction: Problems of visual fidelity, gaze, and social potential. Culture and Brain, 4: 134–146. https://doi.org/10.1007/s40167-016-0041-8
Remez, R. E., Fellowes, J. M. and Rubin, P. E. (1997) Talker identification based on phonetic information. Journal of Experimental Psychology: Human Perception and Performance 23(3): 651–666. https://doi.org/10.1037//0096-1523.23.3.651
Roebuck, R. and Wilding, J. (1993) Effects of vowel variety and sample length on identification of a speaker in a line-up. Applied Cognitive Psychology 7(6): 475–481. https://doi.org/10.1002/acp.2350070603
Ruch, H. (2015) Vowel convergence and divergence between two Swiss German dialects. Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015).
Ruch, H. (2018) The role of acoustic distance and sociolinguistic knowledge in dialect identification. Frontal Psychology 9: 818. https://doi.org/10.3389/fpsyg.2018.00818
Ruch, Hanna (2021). Dialect contact in real interactions and in an agent-based model. Speech Communication 134: 55–70. https://doi.org/10.1016/j.specom.2021.09.003
Schweinberger, S. R. and Zäske, R. (2018) Perceiving speaker identity from the voice. In S. Frühholz and P. Belin (eds) The Oxford Handbook of Voice Perception 539–560. Oxford: Oxford University Press. https://doi.org/10.1093/oxfordhb/9780198743187.013.24
Theodore, R. M. and Miller, J. L. (2010) Characteristics of listener sensitivity to talker-specific phonetic detail. Journal of the Acoustical Society of America 128(4): 2090–2099. https://doi.org/10.1121/1.4782541
Tuninetti, A., Chládková, K., Peter, V., Schiller, N. O. and Escudero, P. (2017) When speaker identity is unavoidable: Neural processing of speaker identity cues in natural speech. Brain and Language 174: 42–49. https://doi.org/10.1016/j.bandl.2017.07.001
Van Engen, K. J., Baese-Berk, M., Baker, R. E., Choi, A., Kim, M. and Bradlow, A. R. (2010) The Wildcat corpus of native- and foreign-accented English: Communicative efficiency across conversational dyads with varying language alignment profiles. Language and Speech 53: 510–540. https://doi.org/10.1177/0023830910372495
Van Lancker, D. R., Cummings, J. L., Kreiman, J. and Dobkin, B. H. (1988) Phonagnosia: A dissociation between familiar and unfamiliar voices. Cortex 24(2): 195–209. https://doi.org/10.1016/S0010-9452(88)80029-7
Walker, A. and Campbell-Kibler, K. (2015) Repeat what after whom? Exploring variable selectivity in a cross-dialectal shadowing task. Frontiers in Psychology 6: 546. https://www.frontiersin.org/article/10.3389/fpsyg.2015.00546