Between-speaker rhythmic variability is not dependent on language rhythm, as evidence from Persia reveals


  • Homa Asadi Alzahra University
  • Mandana Nourbakhsh Alzahra University
  • Lei He University of Tübingen
  • Elisa Pellegrino University of Zurich
  • Volker Dellwo University of Zurich



speaker idiosyncrasies, speech rhythm, forensic phonetics


Acoustic measures of speech rhythm based on the durational characteristics of consonantal and vocalic intervals (henceforth C- or V-intervals) as well as syllabic intensity reveal between-speaker variability. The evidence obtained so far is based on speakers of stressed-timed languages, which are assumed to have complex consonant clusters and a higher degree of vowel reduction. Speakers of stressed-timed languages might operate their articulatory organs in different ways due to the syllable complexity and vowel reduction. Complex consonant clusters are released differently, and vowel reduction tends to be produced more or less strongly depending on speakers. When a language lacks such features, it is possible that rhythmic variation between its speakers decreases. In the present study, we aimed at exploring between- and within-speaker rhythmic variability in Persian, an Indo-European language categorised as syllable-timed. Acoustic correlates of speech rhythm (%V, ?V[ln], ?C[ln], n-PVI-V) and articulation rate were obtained from two Persian corpora with different sources of within-speaker variability. In the first corpus, the source of within-speaker variability mainly comes from non-contemporaneous recording sessions, and in the second corpus, from different speech rates. Results revealed that there were significant differences between speakers in all investigated speech rhythm measures in Persian and %V best discriminated between speakers. This reveals that the lack of typical stress-time features does not affect between-speaker variability in speech rhythm.

Author Biographies

Homa Asadi, Alzahra University

Homa Asadi is a PhD candidate of General Linguistics at Alzahra University. She holds a BA in English Language and Literature from Shahid Chamran University of Ahwaz and an MA in General Linguistics from Alzahra University, Iran. She has experience of teaching phonetics as a TA at Alzahra University. Her research interests primarily lie in the field of forensic phonetics, in particular exploring the speaker-specific acoustic parameters encoded in the speech of individuals speaking Persian and other languages and dialects spoken in Iran.

Mandana Nourbakhsh, Alzahra University

Mandana Nourbakhsh has a PhD in General Linguistics from the University of Tehran, and she is currently an associate professor teaching phonetics, phonology and psycholinguistics at the Linguistics Department of Alzahra University, Iran. Her areas of research interest include laboratory phonetics and phonology as well as psycholinguistics and psychoacoustics. She has published numerous papers on issues related to her main areas of research interest.

Lei He, University of Tübingen

Lei He (PhD, MSc, MA, BA) is a postdoctoral fellow at the University of Tübingen supported by an early postdoc mobility grant (P2ZHP1_178109) from the Swiss National Science Foundation. He received his doctoral degree at the University of Zurich, where he also worked as a postdoctoral researcher for one year. He is interested in between-speaker variability in speech production, in particular how articulatory factors affect the acoustic parameters that underpin the rhythmical differences between speakers. 

Elisa Pellegrino, University of Zurich

Elisa Pellegrino has a PhD in Linguistics from the University of Naples L'Orientale. She is currently working as postdoc in phonetics at the University of Zurich, where she also teaches Computational Processing of Speech Rhythm for Speaker and Language Classification. Her research interests range from accommodation in speech communication, production and perception of L2 speech rhythm, to age-related changes in speech and voice. 

Volker Dellwo, University of Zurich

Volker Dellwo (MA, PhD) is Associate Professor of Phonetics and Speech Sciences in the Department of Computational Linguistics at University of Zurich (UZH) and works as an expert witness in forensic phonetics in the departmental Center of Forensic Phonetics and Acoustics. His research interests lie in a wide variety of phenomena related to speaker individuality and speaker recognition by humans and machines. He is an executive committee member of the International Association of Forensic Phonetics and Acoustics. 


Amino, K. and Arai, T. (2009) Speaker-dependent characteristics of the nasals. Forensic Science International 185(1–3): 21–28.

Bates, D., Maechler, M., Bolker, B. and Walker, S. (2016) lme4: Linear mixed-e?ects models using Eigen and S4 (R package version 1.1-7). http://CRAN.R package=lme4. Accessed 24 November 2016.

Boersma, P. and Weenink, D. (2013) Praat: doing phonetics by computer., Accessed 13 July 2013.

Dellwo, V. (2010) Influences of speech rate on the acoustic correlates of speech rhythm: an experimental phonetic study based on acoustic and perceptual evidence. PhD dissertation, Bonn University.

Dellwo, V. and Fourcin, A. (2013) Rhythmic characteristics of voice between and within languages. Travaux Neuchâtelois de Linguistique 59: 87–107.

Dellwo, V., Huckvale, M. and Ashby, M. (2007) How is individuality expressed in voice? An introduction to speech production and description for speaker classification. In C. Müller (ed.) Speaker Identification vol. 1: Fundamentals, Features, and Methods 1–20. Berlin: Springer Verlag.

Dellwo, V., Leemann, A. and Kolly, M. (2012) Speaker idiosyncratic rhythm features in the speech signal. In Interspeech-2012: 1584–1587. Portland, OR, USA.

Dellwo, V., Leeman, A. and Kolly, M. (2015) Rhythmic variability between speakers: articulatory, prosodic, and linguistic factors. Journal of the Acoustical Society of America 137(3): 1513–1528.

Gold, E., and French, J. P (2011) International practices in forensic speaker comparison. International Journal of Speech, Language and the Law 18(2): 293–307.

Gold, E., French, J. P. and Harrison, P (2013) Examining long-term formant distributions as a discriminant in forensic speaker comparisons under a likelihood ratio framework. Proceedings of Meetings on Acoustics 19(1): 1–8.

Goldstein, U. (1976) Speaker-identifying features based on formant tracks. Journal of the Acoustical Society of America 59(1): 176–182.

Gordon, M, Barthmaier, P. and Sands, K. (2002) A cross-linguistic study of voiceless fricatives. Journal of the International Phonetic Association 32(2): 2–32.

Grabe, E. and Low, E. L. (2002) Durational variability in speech and rhythm class hypothesis. In N. Warner and C. Gussenhoven (eds.) Papers in Laboratory Phonology vol.7: 515–543. Berlin and New York: Mouton de Gruyter.

He, L. (2018) Development of speech rhythm in first language: the role of syllable intensity variability. Journal of the Acoustical Society of America 143(6): 463–467.

He, L. and Dellwo, V. (2014) Speaker idiosyncratic variability of intensity across syllables. In Interspeech-2014: 233–237. Singapore.

He, L. and Dellwo, V. (2016) The role of syllable intensity in between-speaker rhythmic variability. International Journal of Speech, Language and the Law 23(2): 243–273.

Hudson, T., de Jong, G., McDougall, K., Harrison, P. and Nolan, F. (2007) F0 statistics for 100 young male speakers of Standard Southern British English. In Proceedings of the 16th International Congress of Phonetic Sciences, Saarbrücken: 1809–1812.

IBM Corp. (2012) IBM SPSS Statistics for Windows (version 21.0). Armonk, NY: International Business Machines Corporation.

Jessen, M. (2008) Forensic phonetics. Language and Linguistics Compass 2(4): 671–711.

Kahn, J., Audibert, J. F. B. and Rossato, S. (2011) Inter and intra-speaker variability in French: an analysis of oral vowels and its implication for automatic speaker verification. In Proceedings of the 17th International Congress of Phonetic Sciences, Hong Kong: 1002–1005.

Kinoshita, Y. (2002) Use of likelihood ratio and Bayesian approach in forensic speaker identification. In Proceedings of the 9th Australian International Conference on Speech Science and Technology, Melbourne, Australia: 297–302.

Kinoshita Y. (2005) Does Lindley’s LR estimation formula work for speech data? Investigation using long-term F0. International Journal of Speech, Language and the Law 12(2): 235–254.

Lazard, G. (1992) Grammar of Contemporary Persian. Costa Mesa, CA: Mazda Publishers.

Leemann, A., Kolly, M.-J. and Dellwo, V. (2014) Speaker-individuality in suprasegmental temporal features: implications for forensic voice comparison. Forensic Science International 238: 59–67.

Lindh J. (2006) Preliminary descriptive F0-statistics for young male speakers. Lund University Working Papers 52: 89–92.

Marcus, S. (1981) Acoustic determinants of perceptual center (P-center) location. Perception and Psychophysics 30(3): 247–256.

Morrison, G. S. (2010) Forensic voice comparison. In I. Freckelton and H. Selby (eds) Expert Evidence Ch. 99. Sydney: Thomson Reuters.

Nolan, F. (1983) The Phonetic Bases of Speaker Recognition. Cambridge: Cambridge University Press.

Nolan, F. and Asu, E. L. (2009) The pairwise variability index and coexisting rhythms in language. Phonetica 66(1–2): 64–77.

Nolan, F. and Grigoras, C. (2005) A case for formant analysis in forensic speaker identification. International Journal of Speech Language and the Law 12(2): 143–173.

Prieto, P., del Mar Vanrell, M., Astruc, L., Payne, E. and Post, B. (2012) Phonotactic and phrasal properties of speech rhythm. Evidence from Catalan, English, and Spanish. Speech Communication 54(6): 681–702.

R Core Team (2014) R: A Language and Environment for Statistical Computing (version 3.3.3). R Foundation for Statistical Computing., Accessed 20 November 2016.

Ramus, F., Nespor, M. and Mehler, J. (1999) Correlates of linguistic rhythm in the speech signal. Cognition 73(3): 265–292.

Roach, P. (1983) English Phonetics and Phonology. Cambridge: Cambridge University Press.

Rose, P. (2002) Forensic Speaker Identification. New York: Taylor & Francis.

Rose, P. (2003) The technical comparison of forensic voice samples. In I. S. Freckleton and H. Selby (eds) Expert Evidence Ch. 99. North Ryde: Lawbook Co.

Rose, P. (2007) Forensic speaker discrimination with Australian English vowel acoustics. In Proceedings of the 16th International Congress of Phonetic Sciences. Saarbrücken: 1817–1820.

Sadeghi, V. (2015) A phonetic study of vowel reduction in Persian. Language Related Research 30: 165–187.

Schindler, C. and Draxler, C. (2013) Using spectral moments as a speaker specific feature in nasals and fricatives. In Interspeech-2013: 2793–2796. Lyon, France.

Sheikh Sangtajan, Sh. and Bijankhan, M. (2010) The study of vowel reduction in Persian spontaneous speech. Journal of Research in Linguistics 2: 35–48.

Wiget, L., White, L., Schuppler, B., Grenon, I., Rauch, O. and Mattys, S. L. (2010) How stable are acoustic metrics of contrastive speech rhythm? Journal of the Acoustical Society of America 127(3): 1559–1569.

Windfuhr, G. L. (1979) Persian Grammar: History and State of its Study. New York: Mouton de Gruyter.

Wolf, J. J. (1972) Efficient acoustic parameters for speaker recognition. Journal of the Acoustical Society of America 51(68): 255–272.

Yava?, M. (2011) Applied English Phonology. Chichester: Wiley-Blackwell.

Yoon, T. J. (2010) Capturing inter-speaker invariance using statistical measures of speech rhyth”. In Electronic Proceedings of Speech Prosody: 1–4. Chicago, USA.



How to Cite

Asadi, H., Nourbakhsh, M., He, L., Pellegrino, E., & Dellwo, V. (2018). Between-speaker rhythmic variability is not dependent on language rhythm, as evidence from Persia reveals. International Journal of Speech, Language and the Law, 25(2), 151–174.