Audio-visual speech perception of plosive consonants by CG learners of English
DOI:
https://doi.org/10.1558/jmbs.23017Keywords:
audio-visual speech perception, plosive consonants, Cypriot-Greek, second languageAbstract
Second language (L2) speech perception can be a challenging process, as listeners have to cope with imperfect auditory signals and imperfect L2 knowledge. However, the aim of L2 speech perception is to extract linguistic meaning and enable communication between interlocutors in the language of input. Normal-hearing listeners can effortlessly perceive and understand the auditory message(s) conveyed, regardless of distortions and background noise, as they can endure a dramatic decrease in the amount of spectral and temporal information present in the auditory signal. In their attempt to recognise speech, listeners can be substantially assisted by looking at the face of the speaker. Visual perception is important even in the case of intelligible speech sounds, indicating that auditory and visual information should be combined. The present study examines how audio-visual integration affects Cypriot-Greek (CG) listeners’ recognition performance of plosive consonants on word-level in L2 English. The participants were 14 first language (L1) CG users, who were non-native speakers of English. They completed a perceptual minimal-set task requiring the extraction of speech information from unimodal auditory stimuli, unimodal visual stimuli, bimodal audio-visual congruent stimuli, and incongruent stimuli. The findings indicated that overall performance was better in the bimodal congruent task. The results point to the multisensory speech-specific mode of perception, which plays an important role in alleviating the majority of moderate to severe L2 comprehension difficulties. CG listeners’ success seems to depend upon the ability to relate what they see to what they hear.
References
Arvaniti, A. (2001). Comparing the phonetics of single and geminate consonants in Cypriot and Standard Greek. In Y. Aggouraki, A. Arvaniti, J. I. M. Davy, D. Goutsos, M. Karyolaimou, A. Panayiotou, A. Papapavlou, P. Pavlou & A. Roussou (Eds.), Proceedings of the 4th International Conference on Greek Linguistics (pp. 37–44). University Studio Press.
Arvaniti, A. (2006). Linguistic practices in Cyprus and the emergence of Cypriot Standard Greek. San Diego Linguistic Papers, 2, 1–24.
Arvaniti, A. (2007). Greek phonetics: The state of the art. Journal of Greek Linguistics, 8(1), 97–208. https://doi.org/10.1075/jgl.8.08arv
Auer, E. T. Jr., & Bernstein, L. E. (1997). Speechreading and the structure of the lexicon: Computationally modeling the effects of reduced phonetic distinctiveness on lexical uniqueness. Journal of the Acoustical Society of America, 102(6), 3704–10. https://doi.org/10.1121/1.420402
Baltazani, M., & Nicolaidis, K. (2013). Production of the Greek rhotic in initial and intervocalic position: An acoustic and electropalatographic study. In Z. Gavriilidou, A. Efthymiou, E. Thomadaki & P. Kambakis-Vougiouklis (Eds.), Selected papers of the 10th International Conference of Greek Linguistics (pp. 141–152). University of Thrace.
Bertelson, P., Vroomen, J., & de Gelder, B. (2003). Visual recalibration of auditory speech identification: A McGurk aftereffect. Psychological Science, 14(6), 592–97. https://doi.org/10.1046/j.0956-7976.2003.psci_1470.x
Besle, J., Fort, A., Delpuech, C., & Giard, M-H. (2004). Bimodal speech: early suppressive visual effects in human auditory cortex. European Journal of Neuroscience, 20(8), 2225–34. https://doi.org/10.1111/j.1460-9568.2004.03670.x
Best, C. T. (1995). A direct realist view of cross-language speech, perception. In W. Strange (Ed.), Speech Perception and Linguistic Experience (pp. 171–204). York.
Bien, H., Lagemann, L., Dobel, C., & Zwitserlood, P. (2009). Implicit and explicit categorization of speech sounds – dissociating behavioural and neurophysiological data. European Journal of Neuroscience, 30, 339–46. https://doi.org/10.1111/j.1460-9568.2009.06826.x
Bien, H., & Zwitserlood, P. (2013). Processing nasals with and without consecutive context phonemes: Evidence from explicit categorization and the N100. Frontiers in Psychology, 4, 21. https://doi.org/10.3389/fpsyg.2013.00021
Binnie, C. A., Montgomery, A. A., & Jackson, P. L. (1974). Auditory and visual contributions to the perception of consonants. Journal of Speech, Language, and Hearing Research, 17(4), 619-30. https://doi.org/10.1044/jshr.1704.619
Botinis, A., Fourakis, M., & Prinou, I. (2000). Acoustic structure of the Greek stop consonants. Glossologia, 11-12, 167-99.
Bozkurt, E., Erzin, E., Erdem, C. E., & Ozkan, M. (2007). Comparison of phoneme and viseme based acoustic units for speech driven realistic lip animation. 3DTV Conference, 2007, 1–4. https://doi.org/10.1109/3DTV.2007.4379417
Burnham, D. (1998). Language specificity in the development of auditory-visual speech perception. In R. Campbell & B. Dodd (Eds.), Hearing by eye II: Advances in the psychology of speechreading and auditory-visual speech (pp. 27–60). Erlbaum.
Cappelletta, L., & Harte, N. (2012). Phoneme-to-viseme mapping for visual speech recognition. In Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods (ICPRAM 2) (pp. 322–29). SciTePress.
Carlisle, R. S. (1994). Markedness and environment as internal constraints on the variability of interlanguage phonology. In M. Yavas (Ed.), First and second language phonology. (pp. 223–49). Singular Publishing Group Inc.
Chomsky, N., & Miller, G. A. (1963). Introduction to the formal analysis of natural languages. In R. D. Luce, R. R. Bush & E. Galanter (Eds.), Handbook of mathematical psychology, (Vol. 2, pp. 269–321). Wiley.
Denison, R. N., Driver, J., & Ruff, C. C. (2013). Temporal structure and complexity affect audio-visual correspondence detection. Frontiers in Psychology, 3, 619. https://doi.org/10.3389/fpsyg.2012.00619
Deterding, D. H. (1997). The formants of monophthong vowels in Standard Southern British English pronunciation. Journal of the International Phonetic Association, 27, 47–55. https://doi.org/10.1017/S0025100300005417
Dimitriou, D. (2018). L2 acquisition and production of the English rhotic by L1 Greek-Cypriot speakers: The effect of L1 articulatory routines and phonetic context. Philologia, 16(1), 45–64. https://doi.org/10.18485/philologia.2018.16.16.3
Docherty, G. J. (1992). The timing of voicing in British English obstruents. Foris Publications. https://doi.org/10.1515/9783110872637
Eckman, F. (1977). Markedness and the contrastive analysis hypothesis. Language Learning, 27, 315–30. https://doi.org/10.1111/j.1467-1770.1977.tb00124.x
Flege, J. E. (1995). Second language speech learning: Theory, findings and problems. In W. Strange (Ed.), Speech perception and linguistic experience (pp. 233–77). York.
Flege, J. E. (2009). Give input a chance!. In T. Piske & M. Young-Scholten (Eds.), Input matters in SLA (pp. 175–90). Multilingual Matters. https://doi.org/10.21832/9781847691118-012
Flege, J. E., & Liu, S. (2001). The effect of experience on adults’ acquisition of a second language. Studies in Second Language Acquisition, 23(4), 527–52. https://doi.org/10.1017/S0272263101004041
Flege, J. E., Yeni-Komshian, G., & Liu, S. (1999). Age constraints on second language learning. Journal of Memory and Language, 41, 78–104. https://doi.org/10.1006/jmla.1999.2638
Flemming, E. (2005). Speech perception and phonological contrast. In D. B. Pisoni & R.E. Remez (Eds.), The handbook of speech perception (pp. 156–81). Blackwell. https://doi.org/10.1002/9780470757024.ch7
Fujisaki, W., & Nishida, S. (2005). Temporal frequency characteristics of synchrony-asynchrony discrimination of audio-visual signals. Experimental Brain Research, 166, 455–64. https://doi.org/10.1007/s00221-005-2385-8
Ganong, W. F. (1980). Phonetic categorization in auditory word perception. Journal of Experimental Psychology: Human Perception and Performance, 6(1), 110–25. https://doi.org/10.1037/0096-1523.6.1.110
Goldstein, E. (2013). Sensation and perception. Cengage Learning.
Green, K. P. (1998). The use of auditory and visual information during phonetic processing: Implications for theories of speech perception. In R. Campbell & B. Dodd (Eds.), Hearing by eye II: Advances in the psychology of speechreading and auditory-visual speech (pp. 3–25). Erlbaum.
Hardison, D. (1999). Bimodal speech perception by native and nonnative speakers of English: Factors influencing the McGurk effect. Language Learning, 49, 213–83, Suppl. 1. https://doi.org/10.1111/0023-8333.49.s1.7
Hazan, V., Sennema, A., & Faulkner, A. (2002) Audiovisual perception in L2 learners. In Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP 2002), (pp. 1685-1688). Denver, Colorado. https://doi.org/10.21437/ICSLP.2002-426
Hazen, T. J., Saenko, K., La, C-H., & Glass, J. R. (2004). A segment-based audio-visual speech recognizer: Data collection, development, and initial experiments. In Proceedings of the 6th International Conference on Multimodal Interfaces (pp. 235– 42). State College, Pennsylvania. https://doi.org/10.1145/1027933.1027972
Hickok, G., & Poeppel, D. (2007). The cortical organization of speech processing. Nature Reviews Neuroscience, 8, 393–402. https://doi.org/10.1038/nrn2113
Iverson, P., Kuhl, P. K., Akahane-Yamada, R., Diesch, E., Tohkura, Y. I., Kettermann, A., & Siebert, C. (2003). A perceptual interference account of acquisition difficulties for non-native phonemes. Cognition, 87(1), B47–B57. https://doi.org/10.1016/S0010-0277(02)00198-1
Jeffers, J., & Barley, M. (1971). Speechreading (lipreading). Charles C. Thomas Pub. Ltd.
Jackson, P. L. (1988). The theoretical minimal unit for visual speech perception: Visemes and coarticulation. The Volta Review, 90(5), 99–115.
Kainada, E. (2012). The acoustics of post-nasal stop voicing in Standard Modern Greek. In Z. Gavriilidou, A. Efthymiou, E. Thomadaki & P. Kambakis-Vougiouklis (Eds.), Selected papers of the 10th International Conference of Greek Linguistics (pp. 320–29). University of Thrace.
Kappa, I. (2002). On the acquisition of syllable structure in Greek. Journal of Greek Linguistics, 3, 1–52. https://doi.org/10.1075/jgl.3.03kap
Kkese, E. (2016). Identifying plosives in L2 English: The case of L1 Cypriot Greek speakers. Peter Lang. https://doi.org/10.3726/b10379
Kkese, E. (2020a). L2 writing assessment: The neglected skill of spelling. Cambridge Scholars Publishing.
Kkese, E. (2020b). Phonological awareness and literacy in L2: Sensitivity to phonological awareness and phoneme-grapheme correspondences in L2 English. In G. Neokleous, A. Krulatz & R. Farrelly (Eds.), Handbook of research on cultivating literacy in diverse and multilingual classrooms (pp. 62–81). IGI Global Press. https://doi.org/10.4018/978-1-7998-2722-1.ch004
Kkese, E. (2020c). Categorisation of plosive consonants in L2 English: Evidence from bilingual Cypriot-Greek users. In L. Sciriha (Ed.), Comparative studies in bilingualism and bilingual education (pp. 179–99). Cambridge Scholars Publishing.
Kkese, E., & Karpava, K. (2019). Applying the native language magnet theory to an L2 setting: Insights into the Cypriot Greek adult perception of L2 English. In E. Babatsouli (Ed.), Proceedings of the International Symposium on Monolingual and Bilingual Speech 2019 (pp. 67–74). Institute of Monolingual and Bilingual Speech.
Kkese, E., & Karpava, S. (2021). Challenges in the perception of L2 English phonemes by native speakers of Cypriot Greek. Journal of Monolingual and Bilingual Speech, 3(1), 1–39. https://doi.org/10.1558/jmbs.15362
Kkese, E., & Petinou, K. (2017a). Perception abilities of L1 Cypriot Greek listeners – Types of errors involving plosive consonants in L2 English. Journal of Psycholinguistic Research, 46(1), 1–25. https://doi.org/10.1007/s10936-016-9417-3
Kkese, E., & Petinou, K. (2017b). Factors affecting the perception of plosives in second language English by Cypriot-Greek listeners. In E. Babatsouli (ed.), Proceedings of the International Symposium on Monolingual and Bilingual Speech 2017 (pp. 162–67). Institute of Monolingual and Bilingual Speech.
Kkese, E. & Lokhtina, I. (2017). Insights into the Cypriot-Greek attitudes toward multilingualism and multiculturalism in Cyprus. Journal of Mediterranean Studies, 26(2), 227–46.
Klatt, D. H. (1975). Vowel lengthening is syntactically determined in a connected discourse. Journal of Phonetics, 3, 129–40. https://doi.org/10.1016/S0095-4470(19)31360-9
Ladefoged, P., & Ferrari Disner, S. (2012). Vowels and consonants. Wiley.
Lansing, C. R., & McConkie, G. W. (1999). Attention to facial regions in segmental and prosodic visual speech perception tasks. Journal of Speech, Language, and Hearing Research, 42, 526–39. https://doi.org/10.1044/jslhr.4203.526
Lansing, C. R., & McConkie, G. W. (2003). Word identification and eye fixation locations in visual and visual-plus-auditory presentations of spoken sentences. Perception & Psychophysics, 65(4), 536–52. https://doi.org/10.3758/BF03194581
Lee, S., & Yook, D. (2002). Audio-to-visual conversion using hidden Markov models. In M. Ishizuka & A. Sattar (Eds.), Proceedings of the 7th Pacific Rim International Conference on Artificial Intelligence: Trends in artificial intelligence (pp. 563–70). Springer-Verlag. https://doi.org/10.1007/3-540-45683-X_60
Lengeris, A. (2009). Individual differences in second-language vowel learning. Doctoral dissertation, University College London.
Lengeris, A., & Nicolaidis, K. (2016). The identification and production of English consonants by Greek speakers. Selected papers of the 21st International Symposium on Theoretical and Applied Linguistics, 21, 224–38.
Lisker, L., & Abramson, A. S. (1964). A cross-language study of voicing in initial stops: acoustical measurements. Word, 20, 384–422. https://doi.org/10.1080/00437956.1964.11659830
Liu, H., Ng, M., Wan, M., Wang, S., & Zhang, Y. (2007). Effects of place of articulation and aspiration on voice onset time in Mandarin esophageal speech. Folia Phoniatrica et Logopaedica, 59, 147–54. https://doi.org/10.1159/000101773
Lucey, P., Martin, T., & Sridharan, S. (2004). Confusability of phonemes grouped according to their viseme classes in noisy environments. In F. Cox, S. Cassidy, R. Mannell & S. Palethorpe (Eds.), Proceedings of the 10th Australian International Conference on Speech Science and Technology (pp. 265–270). Australian Speech Science and Technology Association Inc, Canberra, ACT.
MacLeod, A., & Summerfield, Q. (1987). Quantifying the contribution of vision to speech perception in noise. British Journal of Audiology, 21(2), 131–41. https://doi.org/10.3109/03005368709077786
Markides, A. (1989). Lipreading –Theory and practice. Journal of the British Association of Teachers of the Deaf, 13(2), 29–47.
Marschark, M., Lepoutre, D., & Bement, L. (1998). Mouth movement and signed communication. In R. Campbell & B. Dodd (Eds.), Hearing by eye II: Advances in the psychology of speechreading and auditory-visual speech (pp. 245–66). Erlbaum.
Massaro, D. W., Cohen, M. M., & Gesi, A. T. (1993). Long-term training, transfer, and retention in learning to lipread. Perception & Psychophysics, 53(5), 549–62. https://doi.org/10.3758/BF03205203
Massaro, D. W., & Stork, D. G. (1998). Speech recognition and sensory integration. American Scientist, 86(3), 236–44. https://doi.org/10.1511/1998.25.236
McAllister, R., Flege, J., & Piske, T. (2002). The influence of L1 on the acquisition of Swedish quantity by native speakers of Spanish, English, and Estonian. Journal of Phonetics, 30, 229–58. https://doi.org/10.1006/jpho.2002.0174
McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 264, 746–48. https://doi.org/10.1038/264746a0
Mitterer, H., & Reinisch, E. (2016). Visual speech influences speech perception immediately but not automatically. Perception & Psychophysics, 79(2), 660–78. https://doi.org/10.3758/s13414-016-1249-6
Neti, C., Potamianos, G., Luettin, J., Matthews, I., Glotin, H., Vergyri, D., Sison, J., Mashari, A., & Zhou, J. (2000). Audio-visual speech recognition. Technical report, Center for Language and Speech Processing, The Johns Hopkins University.
Newton, B. (1972). The generative interpretation of a dialect. A study of modern Greek phonology. Cambridge University Press.
Ng, M., Chen, Y., Wong, S., & Xue, S. (2011). Interarticulator timing control during inspiratory phonation. Journal of Voice, 25(3), 319–25. https://doi.org/10.1016/j.jvoice.2010.01.001
Obleser, J., & Eisner, F. (2009). Pre-lexical abstraction of speech in the auditory cortex. Trends in Cognitive Sciences, 13(1), 14–19. https://doi.org/10.1016/j.tics.2008.09.005
Ortega-Llebaria, M, Faulkner, A., & Hazan, V. (2001). Auditory-visual L2 speech perception: effects of visual cues and acoustic-phonetic context for Spanish learners of English. Speech, Hearing and Language: UCL Work in Progress, 13, 39–51.
Potamianos, G., Neti, C., Gravier, G., Garg, A., & Senior, A.W. (2003). Recent advances in the automatic recognition of audio-visual speech. Proceeding of the IEEE, 91(9), 1306–26. https://doi.org/10.1109/JPROC.2003.817150
Recanzone, G. H. (2003). Auditory influences on visual temporal rate perception. Journal of Neurophysiology, 89, 1078–93. https://doi.org/10.1152/jn.00706.2002
Riney, T. J., & Flege, J. E. (1998). Changes over time in global foreign accent and liquid identifiability and accuracy. Studies in Second Language Acquisition, 20, 213–44. https://doi.org/10.1017/S0272263198002058
Rosenblum, L. D. (2005). The primacy of multimodal speech perception. In D. Pisoni & R. Remez (Eds.), Handbook of speech perception (pp. 51–78). Blackwell. https://doi.org/10.1002/9780470757024.ch3
Rosenblum, L. D. (2019). Audiovisual speech perception and the McGurk effect. In Oxford research encyclopedia of linguistics. Oxford University Press. https://doi.org/10.1093/acrefore/9780199384655.013.420
Saenko, K. (2004). Articulatory features for robust visual speech recognition. Master’s thesis, Massachusetts Institute of Technology. https://doi.org/10.1145/1027933.1027960
Smeele, P. M. T. (1996). Psychology of human speechreading. In D. G. Stork & M. E. Hennecke (Eds.), Speechreading by humans and machines (pp. 3–15). Springer. https://doi.org/10.1007/978-3-662-13015-5_1
Spence, C., & Squire, S. (2003). Multisensory integration: Maintaining the perception of synchrony. Current Biology, 13, R519–R521. https://doi.org/10.1016/S0960-9822(03)00445-7
Stevens, K. N., & Blumstein, S. E. (1981). The search for invariant acoustic correlates of phonetic features. In P. D. Eimas & J. L. Miller (Eds.), Perspectives on the study of speech (pp. 1–38). Lawrence Erlbaum Associates, Hillsdale, NJ.
Sumby, W. H., & Pollack, I. (1954). Visual contribution to speech intelligibility in noise. Journal of Acoustical Society America, 26(2), 212–15. https://doi.org/10.1121/1.1907309
Summerfield, Q. (1987). Some preliminaries to a comprehensive account of audio-visual speech perception. In B. Dodd & R. Campbell (Eds.), Hearing by eye: The psychology of lip-reading (pp. 3–51). LEA.
Summerfield, Q., MacLeod, A., McGrath, M., & Brooke, M. (1989). Lips, teeth, and the benefits of lipreading. In A. W. Young & H. D. Ellis (Eds.), Handbook of research on face processing (pp. 223–33). Elsevier. https://doi.org/10.1016/B978-0-444-87143-5.50019-6
Terkourafi, M. (2001). Politeness in Cypriot Greek: A frame-based approach. Doctoral dissertation, University of Cambridge.
Vatikiotis-Bateson, E., Eigsti, I-M., Yano, S., & Munhall, K. G. (1998). Eye movement of perceivers during audiovisual speech perception. Perception & Psychophysics, 60(6), 926–40. https://doi.org/10.3758/BF03211929
Wang, Y., Behne, D., & Jiang, H. (2008). Linguistic experience and audio-visual perception of nonnative fricatives. Journal of the Acoustical Society of America, 124, 1716–26. https://doi.org/10.1121/1.2956483
Weinreich, U. (1953). Languages in contrast: Findings and problems. Mouton.
Zampini, M., Guest, S., Shore, D. I., & Spence, C. (2005). Audio-visual simultaneity judgments. Perception & Psychophysics, 67, 531–44. https://doi.org/10.3758/BF03193329