Between-speaker rhythmic variability is not dependent on language rhythm, as evidence from Persia reveals

Homa Asadi; Mandana Nourbakhsh; Lei He; Elisa Pellegrino; Volker Dellwo

doi:10.1558/ijsll.37110

Authors

Homa Asadi Alzahra University
Mandana Nourbakhsh Alzahra University
Lei He University of Tübingen
Elisa Pellegrino University of Zurich
Volker Dellwo University of Zurich

DOI:

https://doi.org/10.1558/ijsll.37110

Keywords:

speaker idiosyncrasies, speech rhythm, forensic phonetics

Abstract

Acoustic measures of speech rhythm based on the durational characteristics of consonantal and vocalic intervals (henceforth C- or V-intervals) as well as syllabic intensity reveal between-speaker variability. The evidence obtained so far is based on speakers of stressed-timed languages, which are assumed to have complex consonant clusters and a higher degree of vowel reduction. Speakers of stressed-timed languages might operate their articulatory organs in different ways due to the syllable complexity and vowel reduction. Complex consonant clusters are released differently, and vowel reduction tends to be produced more or less strongly depending on speakers. When a language lacks such features, it is possible that rhythmic variation between its speakers decreases. In the present study, we aimed at exploring between- and within-speaker rhythmic variability in Persian, an Indo-European language categorised as syllable-timed. Acoustic correlates of speech rhythm (%V, ?V[ln], ?C[ln], n-PVI-V) and articulation rate were obtained from two Persian corpora with different sources of within-speaker variability. In the first corpus, the source of within-speaker variability mainly comes from non-contemporaneous recording sessions, and in the second corpus, from different speech rates. Results revealed that there were significant differences between speakers in all investigated speech rhythm measures in Persian and %V best discriminated between speakers. This reveals that the lack of typical stress-time features does not affect between-speaker variability in speech rhythm.

Author Biographies

Homa Asadi, Alzahra University

Homa Asadi is a PhD candidate of General Linguistics at Alzahra University. She holds a BA in English Language and Literature from Shahid Chamran University of Ahwaz and an MA in General Linguistics from Alzahra University, Iran. She has experience of teaching phonetics as a TA at Alzahra University. Her research interests primarily lie in the field of forensic phonetics, in particular exploring the speaker-specific acoustic parameters encoded in the speech of individuals speaking Persian and other languages and dialects spoken in Iran.
Mandana Nourbakhsh, Alzahra University

Mandana Nourbakhsh has a PhD in General Linguistics from the University of Tehran, and she is currently an associate professor teaching phonetics, phonology and psycholinguistics at the Linguistics Department of Alzahra University, Iran. Her areas of research interest include laboratory phonetics and phonology as well as psycholinguistics and psychoacoustics. She has published numerous papers on issues related to her main areas of research interest.
Lei He, University of Tübingen

Lei He (PhD, MSc, MA, BA) is a postdoctoral fellow at the University of Tübingen supported by an early postdoc mobility grant (P2ZHP1_178109) from the Swiss National Science Foundation. He received his doctoral degree at the University of Zurich, where he also worked as a postdoctoral researcher for one year. He is interested in between-speaker variability in speech production, in particular how articulatory factors affect the acoustic parameters that underpin the rhythmical differences between speakers.
Elisa Pellegrino, University of Zurich

Elisa Pellegrino has a PhD in Linguistics from the University of Naples L'Orientale. She is currently working as postdoc in phonetics at the University of Zurich, where she also teaches Computational Processing of Speech Rhythm for Speaker and Language Classification. Her research interests range from accommodation in speech communication, production and perception of L2 speech rhythm, to age-related changes in speech and voice.
Volker Dellwo, University of Zurich

Volker Dellwo (MA, PhD) is Associate Professor of Phonetics and Speech Sciences in the Department of Computational Linguistics at University of Zurich (UZH) and works as an expert witness in forensic phonetics in the departmental Center of Forensic Phonetics and Acoustics. His research interests lie in a wide variety of phenomena related to speaker individuality and speaker recognition by humans and machines. He is an executive committee member of the International Association of Forensic Phonetics and Acoustics.

References

Amino, K. and Arai, T. (2009) Speaker-dependent characteristics of the nasals. Forensic Science International 185(1–3): 21–28. https://doi.org/10.1016/j.forsciint.2008.11.018

Bates, D., Maechler, M., Bolker, B. and Walker, S. (2016) lme4: Linear mixed-e?ects models using Eigen and S4 (R package version 1.1-7). http://CRAN.R project.org/ package=lme4. Accessed 24 November 2016.

Boersma, P. and Weenink, D. (2013) Praat: doing phonetics by computer. http://www.praat.org, Accessed 13 July 2013.

Dellwo, V. (2010) Influences of speech rate on the acoustic correlates of speech rhythm: an experimental phonetic study based on acoustic and perceptual evidence. PhD dissertation, Bonn University.

Dellwo, V. and Fourcin, A. (2013) Rhythmic characteristics of voice between and within languages. Travaux Neuchâtelois de Linguistique 59: 87–107.

Dellwo, V., Huckvale, M. and Ashby, M. (2007) How is individuality expressed in voice? An introduction to speech production and description for speaker classification. In C. Müller (ed.) Speaker Identification vol. 1: Fundamentals, Features, and Methods 1–20. Berlin: Springer Verlag. https://doi.org/10.1007/978-3-540-74200-5_1

Dellwo, V., Leemann, A. and Kolly, M. (2012) Speaker idiosyncratic rhythm features in the speech signal. In Interspeech-2012: 1584–1587. Portland, OR, USA.

Dellwo, V., Leeman, A. and Kolly, M. (2015) Rhythmic variability between speakers: articulatory, prosodic, and linguistic factors. Journal of the Acoustical Society of America 137(3): 1513–1528. https://doi.org/10.1121/1.4906837

Gold, E., and French, J. P (2011) International practices in forensic speaker comparison. International Journal of Speech, Language and the Law 18(2): 293–307. https://doi.org/10.1558/ijsll.v18i2.293

Gold, E., French, J. P. and Harrison, P (2013) Examining long-term formant distributions as a discriminant in forensic speaker comparisons under a likelihood ratio framework. Proceedings of Meetings on Acoustics 19(1): 1–8. https://doi.org/10.1121/1.4800285

Goldstein, U. (1976) Speaker-identifying features based on formant tracks. Journal of the Acoustical Society of America 59(1): 176–182. https://doi.org/10.1121/1.380837

Gordon, M, Barthmaier, P. and Sands, K. (2002) A cross-linguistic study of voiceless fricatives. Journal of the International Phonetic Association 32(2): 2–32. https://doi.org/10.1017/S0025100302001020

Grabe, E. and Low, E. L. (2002) Durational variability in speech and rhythm class hypothesis. In N. Warner and C. Gussenhoven (eds.) Papers in Laboratory Phonology vol.7: 515–543. Berlin and New York: Mouton de Gruyter.

He, L. (2018) Development of speech rhythm in first language: the role of syllable intensity variability. Journal of the Acoustical Society of America 143(6): 463–467. https://doi.org/10.1121/1.5042083

He, L. and Dellwo, V. (2014) Speaker idiosyncratic variability of intensity across syllables. In Interspeech-2014: 233–237. Singapore.

He, L. and Dellwo, V. (2016) The role of syllable intensity in between-speaker rhythmic variability. International Journal of Speech, Language and the Law 23(2): 243–273. https://doi.org/10.1558/ijsll.v23i2.30345

Hudson, T., de Jong, G., McDougall, K., Harrison, P. and Nolan, F. (2007) F0 statistics for 100 young male speakers of Standard Southern British English. In Proceedings of the 16th International Congress of Phonetic Sciences, Saarbrücken: 1809–1812.

IBM Corp. (2012) IBM SPSS Statistics for Windows (version 21.0). Armonk, NY: International Business Machines Corporation.

Jessen, M. (2008) Forensic phonetics. Language and Linguistics Compass 2(4): 671–711. https://doi.org/10.1111/j.1749-818X.2008.00066.x

Kahn, J., Audibert, J. F. B. and Rossato, S. (2011) Inter and intra-speaker variability in French: an analysis of oral vowels and its implication for automatic speaker verification. In Proceedings of the 17th International Congress of Phonetic Sciences, Hong Kong: 1002–1005.

Kinoshita, Y. (2002) Use of likelihood ratio and Bayesian approach in forensic speaker identification. In Proceedings of the 9th Australian International Conference on Speech Science and Technology, Melbourne, Australia: 297–302.

Kinoshita Y. (2005) Does Lindley’s LR estimation formula work for speech data? Investigation using long-term F0. International Journal of Speech, Language and the Law 12(2): 235–254. https://doi.org/10.1558/sll.2005.12.2.235

Lazard, G. (1992) Grammar of Contemporary Persian. Costa Mesa, CA: Mazda Publishers.

Leemann, A., Kolly, M.-J. and Dellwo, V. (2014) Speaker-individuality in suprasegmental temporal features: implications for forensic voice comparison. Forensic Science International 238: 59–67. https://doi.org/10.1016/j.forsciint.2014.02.019

Lindh J. (2006) Preliminary descriptive F0-statistics for young male speakers. Lund University Working Papers 52: 89–92.

Marcus, S. (1981) Acoustic determinants of perceptual center (P-center) location. Perception and Psychophysics 30(3): 247–256. https://doi.org/10.3758/BF03214280

Morrison, G. S. (2010) Forensic voice comparison. In I. Freckelton and H. Selby (eds) Expert Evidence Ch. 99. Sydney: Thomson Reuters.

Nolan, F. (1983) The Phonetic Bases of Speaker Recognition. Cambridge: Cambridge University Press.

Nolan, F. and Asu, E. L. (2009) The pairwise variability index and coexisting rhythms in language. Phonetica 66(1–2): 64–77. https://doi.org/10.1159/000208931

Nolan, F. and Grigoras, C. (2005) A case for formant analysis in forensic speaker identification. International Journal of Speech Language and the Law 12(2): 143–173. https://doi.org/10.1558/sll.2005.12.2.143

Prieto, P., del Mar Vanrell, M., Astruc, L., Payne, E. and Post, B. (2012) Phonotactic and phrasal properties of speech rhythm. Evidence from Catalan, English, and Spanish. Speech Communication 54(6): 681–702. https://doi.org/10.1016/j.specom.2011.12.001

R Core Team (2014) R: A Language and Environment for Statistical Computing (version 3.3.3). R Foundation for Statistical Computing. http://www.Rproject.org, Accessed 20 November 2016.

Ramus, F., Nespor, M. and Mehler, J. (1999) Correlates of linguistic rhythm in the speech signal. Cognition 73(3): 265–292. https://doi.org/10.1016/S0010-0277(99)00058-X

Roach, P. (1983) English Phonetics and Phonology. Cambridge: Cambridge University Press.

Rose, P. (2002) Forensic Speaker Identification. New York: Taylor & Francis. https://doi.org/10.1201/9780203166369

Rose, P. (2003) The technical comparison of forensic voice samples. In I. S. Freckleton and H. Selby (eds) Expert Evidence Ch. 99. North Ryde: Lawbook Co.

Rose, P. (2007) Forensic speaker discrimination with Australian English vowel acoustics. In Proceedings of the 16th International Congress of Phonetic Sciences. Saarbrücken: 1817–1820.

Sadeghi, V. (2015) A phonetic study of vowel reduction in Persian. Language Related Research 30: 165–187.

Schindler, C. and Draxler, C. (2013) Using spectral moments as a speaker specific feature in nasals and fricatives. In Interspeech-2013: 2793–2796. Lyon, France.

Sheikh Sangtajan, Sh. and Bijankhan, M. (2010) The study of vowel reduction in Persian spontaneous speech. Journal of Research in Linguistics 2: 35–48.

Wiget, L., White, L., Schuppler, B., Grenon, I., Rauch, O. and Mattys, S. L. (2010) How stable are acoustic metrics of contrastive speech rhythm? Journal of the Acoustical Society of America 127(3): 1559–1569. https://doi.org/10.1121/1.3293004

Windfuhr, G. L. (1979) Persian Grammar: History and State of its Study. New York: Mouton de Gruyter. https://doi.org/10.1515/9783110800425

Wolf, J. J. (1972) Efficient acoustic parameters for speaker recognition. Journal of the Acoustical Society of America 51(68): 255–272. https://doi.org/10.1121/1.1913065

Yava?, M. (2011) Applied English Phonology. Chichester: Wiley-Blackwell. https://doi.org/10.1002/9781444392623

Yoon, T. J. (2010) Capturing inter-speaker invariance using statistical measures of speech rhyth”. In Electronic Proceedings of Speech Prosody: 1–4. Chicago, USA.

Between-speaker rhythmic variability is not dependent on language rhythm, as evidence from Persia reveals

Authors

DOI:

Keywords:

Abstract

Author Biographies

References

Downloads

Published

Issue

Section

License

How to Cite

Subscription

Information

Accessibility

Unsubscribe

Latest publications