Learning L2 pronunciation with a mobile speech recognizer: French /y/

Authors

  • Denis Liakin Concordia University
  • Walcir Cardoso Concordia University
  • Natallia Liakina McGill University

DOI:

https://doi.org/10.1558/cj.v32i1.25962

Keywords:

speech recognition, pronunciation, second language acquisition, learner autonomy

Abstract

This study investigates the acquisition of the L2 French vowel /y/ in a mobile-assisted learning environment, via the use of automatic speech recognition (ASR). Particularly, it addresses the question of whether ASR-based pronunciation instruction using a mobile device can improve the production and perception of French /y/. Forty-two elementary French students participated in an experimental study in which they were assigned to one of three groups: (1) the ASR Group, which used an ASR application on their mobile devices to complete weekly pronunciation activities, with immediate written visual (textual) feedback provided by the software and no human interaction; (2) the Non-ASR Group, which completed the same weekly pronunciation activities in individual weekly sessions but with a teacher who provided immediate oral feedback using recasts and repetitions; and finally, (3) the Control Group, which participated in weekly individual meetings ‘to practice their conversation skills’ with a teacher who provided no pronunciation feedback. The study followed a pretest/posttest design. According to the results of the dependent samples t-tests, only the ASR group improved significantly from pretest to posttest (p < 0.001), and none of the groups improved in perception. The overall success of the ASR group on the production measures suggests that this type of learning environment is propitious for the development of segmental features such as /y/ in L2 French.

Author Biographies

  • Denis Liakin, Concordia University
    Denis Liakin is an Associate Professor of French and Linguistics at Concordia University (Montreal, Canada). His research interests include effects of computer technology on L2 learning, corrective phonetics and second language acquisition of syntax.
  • Walcir Cardoso, Concordia University
    Walcir Cardoso is an Associate Professor of Applied Linguistics at Concordia University (Montreal, Canada). He conducts research on the second/foreign language acquisition of phonology, morphosyntax and vocabulary, and the effects of computer technology (e.g., clickers, text to-speech synthesizers, automatic speech recognition) on L2 learning.
  • Natallia Liakina, McGill University
    Natallia Liakina’s professional experience includes teaching French as a second language at the university level in Ontario and in Quebec. Since 2006, she has taught at the French Language Centre at McGill University. Her current research is focused on corrective phonetics and the impact of new technologies on second language teaching and learning both in the classroom and in computer lab settings.

References

Aist, G. (1999). Speech recognition in computer-assisted language learning. In K. Cameron (ed.), CALL: Media, Design & Applications, 165–181. Lisse, Holland: Swets & Zeitlinger.

Aliaga-Garcia, C. and Mora, J. C. (2009). Assessing the effects of phonetic training on L2 sound perception and production. In B. Baptista, A. Rauber and M. Watkins (eds), Recent Research in Second Language Phonetics/Phonology: Perception and Production, 2–31. Newcastle Upon Tyne: Cambridge Scholars.

Baker, W. and Smith, L. (2010). The impact of L2 dialect on learning French vowels: Native English speakers learning Québécois and European French. Canadian. Modern Language Review, 66 (7): 711–738. http://dx.doi.org/10.3138/cmlr.66.5.711

Best, C. T. (1993). Emergence of language-specific constraints in perception of non-native speech: A window on early phonological development. In B. de Boysson-Bardies, S. de Schoenen, P. Jusczyk, P. MacNeilage and J. Morton (eds), Developmental Neurocognition: Speech and Face Processing in the First Year of Life, 289–304. Dordrecht: Kluwer Academic Publishers.

Best, C. T. (1995). A direct realist view of cross-language speech perception. In: W. Strange (ed.), Speech Perception and Linguistic Experience: Issues in Cross-Language Research, 171–206. Baltimore, MD: York Press.

Borden, G., Gerber, A. and Milsark, G. (1983). Production and perception of the /r/-/l/ contrast in Korean adults learning English. Language Learning 33 (3): 499–526. http://dx.doi.org/10.1111/j.1467-1770.1983.tb00946.x

Bradlow, A. R., Pisoni, D. B., Yamada, R. A. and Tohkura, Y. (1997). Training Japanese listeners to identify English /r/ and /l/: II. Some effects of perceptual learning on speech production. Journal of the Acoustical Society of America, 101 (4): 2299–2310. http://dx.doi.org/10.1121/1.418276

Brown, A. (1991). Functional load and the teaching of pronunciation. In A. Brown (ed.), Teaching English Pronunciation: A Book of Readings, 211–224. London: Routledge.

Bruff, D. (2009). Teaching with Classroom Response Systems: Creating Active Learning Environments. San Francisco, CA: Jossey-Bass.

Bybee, J. (2001). Phonology and Language Use. Cambridge: Cambridge University Press. http://dx.doi.org/10.1017/CBO9780511612886

Chapelle, C. (2001). Computer Applications in Second Language Acquisition: Foundations for Teaching, Testing, and Research. Cambridge: Cambridge University Press. http://dx.doi.org/10.1017/CBO9781139524681

Chapelle, C. (2012, April). Using mixed-methods research in technology-based innovation for language learning. Paper presented at the Innovative Practices in Computer Assisted Language Learning Conference, University of Ottawa, Ontario.

Chapelle, C. and Jamieson, J. (2008). Tips for Teachers: Computer-assisted Language Learning. New York: Pearson Longman.

Chun, D. M. and Plass, J. L. (1996). Effects of multimedia annotations on vocabulary acquisition. The Modern Language Journal, 80 (2): 183–198. http://dx.doi.org/10.1111/j.1540-4781.1996.tb01159.x

Christison, M. A. (1999). A Guidebook for Applying Multiple Intelligences Theory in the ESL/EFL Classroom. Burlingame, CA: Alta Book Center Publishers.

Clark, R. (1983). Reconsidering research on learning from media. Review of Educational Research, 53 (4): 445–459. http://dx.doi.org/10.3102/00346543053004445

Coniam, D. (1999). Voice recognition software accuracy with second language speakers of English, System 27 (1): 49–64. http://dx.doi.org/10.1016/S0346-251X(98)00049-9

Cucchiarini, C., Neri, A. and Strik, H. (2009). Oral proficiency training in Dutch L2: The contribution of ASR-based corrective feedback. Speech Communication, 51 (10): 853–863. http://dx.doi.org/10.1016/j.specom.2009.03.003

Dabaghi, A. (2010). Corrective Feedback in Second Language Acquisition: Theory, Research and Practice. LAP Lambert Academic Publishing.

Dalby. J. and Kewley-Port, D. (1999). Explicit pronunciation training using automatic speech recognition. CALICO Journal 16 (3): 425–445.

Dekeyser, R. M. (1993). The effect of error correction on L2 grammar knowledge and oral proficiency. The Modern Language Journal, 77 (4): 501–514. http://dx.doi.org/10.1111/j.1540-4781.1993.tb01999.x

Derwing, T., Munro, M. and Wiebe, G. (1998). Evidence in favor of a broad framework for pronunciation instruction. Language Learning, 48 (3): 393–410. http://dx.doi.org/10.1111/0023-8333.00047

Derwing, T., Munro, M. and Carbonaro, M. (2000). Does popular speech recognition software work with ESL speech?, TESOL Quarterly 34: 592–603. http://dx.doi.org/10.2307/3587748

Dickerson, W. (2004). Stress in the Speech Stream: The Rhythm of Spoken English. Urbana, IL: University of Illinois Press.

Dickerson, W. (2013). Prediction in teaching pronunciation. In C. Chapelle (ed.), The Encyclopedia of Applied Linguistics. Oxford: Wiley-Blackwell.

Eskenazi, M. (1999). Using Automatic Speech Processing for foreign language pronunciation tutoring: Some issues and a prototype. Language Learning and Technology, 2 (2): 62–76.

Flege, J. (1995). Second language speech learning: Theory, findings and problems. In W. Strange (ed.), Speech Perception and Linguistic Experience: Theoretical and Methodological Issues, 233–277. Baltimore, MD: York Press.

Flege, J. (1999). The relation between L2 production and perception. In J. Ohala, Y. Hasegawa, M. Ohala, D. Granville, and A. Bailey (eds), Proceedings of the XIV International Congress of the Phonetic Sciences, Vol. 2, 1273–1276. Berkeley, CA: University of California.

Flege, J., Takagi, N. and Mann, V. (1996). Lexical familiarity and English-language experience affect Japanese adults’ perception of /?/ and /l/. Journal of Acoustical Society of America, 99 (2): 1161–1173. http://dx.doi.org/10.1121/1.414884

Gardner, H. (1983). Frames of Mind: The Theory of Multiple Intelligences. New York: Basic Books.

Godwin-Jones, R. (2009). Emerging technologies: personal learning environments. Language Learning and Technology, 13 (2): 3–9.

Gottfried, T. (1984). Effects of consonant context on the perception of French vowels. Journal of Phonetics, 12: 91–114.

Hale, M. and Reiss, C. (1998). Formal and empirical arguments concerning phonological acquisition. Linguistic Inquiry, 29: 656–683. http://dx.doi.org/10.1162/002438998553914

Handley, Z. (2009). Is text-to-speech synthesis ready for use in computer-assisted language learning?, Speech Communication, 51 (10): 906–919. http://dx.doi.org/10.1016/j.specom.2008.12.004

Hardison, D. (2004). Generalization of computer-assisted prosody training: Quantitative and qualitative findings. Language Learning & Technology, 8 (1): 34–52.

Hardison, D. (2005). Contextualized computer-based L2 prosody training: Evaluating the effects of discourse context and video input. CALICO Journal 22 (2): 175–190.

Hattori, K. (2009). Perception and Production of English /r/-/l/ by Adult Japanese Speakers. Unpublished doctoral dissertation. University College London, UK.

Hincks, R. (2003). Speech technologies for pronunciation feedback and evaluation. ReCALL, 15, 3–20. http://dx.doi.org/10.1017/S0958344003000211

Holec, H. (1981). Autonomy and Foreign Language Learning. Oxford: Pergamon.

Holland, M. (1999). Tutors that listen. CALICO Journal, 16 (3): 245–250.

Jenkins, J. (2000). The Phonology of English as an International Language: New Models, New Norms, New Goals. Oxford: Oxford University Press.

Jenkins, J. (2002). A sociolinguistically based, empirically researched pronunciation syllabus for English as an international language. Applied Linguistics, 23 (1): 83–103. http://dx.doi.org/10.1093/applin/23.1.83

Jongman, A. and Wade, T. (2007). Acoustic variability and perceptual learning: The case of non-native accented speech. In O.-S. Bohn and M. J. Munro (eds), Language Experience in Second Language Speech Learning, 135–150, Amsterdam: John Benjamins.

Joseph, S. and Uther, M. (2009). Mobile devices for language learning: Multimedia approaches. Research and Practice in Technology Enhanced Learning, 4 (1): 7–32. http://dx.doi.org/10.1142/S179320680900060X

Jurafsky, D. and Martin, A. (2008). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. 2nd Edition. Upper Saddle River, NJ: Prentice Hall.

Kawai, G. and Hirose, K. (2000). Teaching the pronunciation of Japanese double-mora phonemes using speech recognition technology. Speech Communication, 30 (2–3): 131–143. http://dx.doi.org/10.1016/S0167-6393(99)00041-2

Kennedy, C. and Levy, M. (2008). L’italiano al telefonino: Using SMS to support beginners’ language learning. ReCALL, 20 (3): 315–330. http://dx.doi.org/10.1017/S0958344008000530

Kiernan, P. and Aizawa, K. (2004). Cell phones in task based learning. Are cell phones useful language learning tools? ReCALL, 16 (1): 71–84. http://dx.doi.org/10.1017/S0958344004000618

Kim, I. (2006). Automatic speech recognition: Reliability and pedagogical implications for teaching pronunciation. Educational Technology and Society, 9 (1): 322–344.

King, R. (1967). Functional load and sound change. Language, 43 (4): 831–852. http://dx.doi.org/10.2307/411969

Koerich, R. (2006). Perception and Production of vowel paragorge by Brazilian EFL students. In B. Baptista and M. Watkins (eds), English with a Latin Beat. Studies in Portuguese/Spanish – English Interphonology, 91–104). Studies in Bilingualism 31. Amsterdam: John Benjamins.

Kukulska-Hulme, A. and Shield, L. (2008). An overview of mobile assisted language learning: From content delivery to supported collaboration and interaction. ReCALL, 20 (3): 271–289. http://dx.doi.org/10.1017/S0958344008000335

LaRocca, S., Morgan, J. and Bellinger, S. (1999). On the path to 2X learning: Exploring the possibilities of advanced speech recognition, CALICO Journal 16 (3): 295–310.

Levis, J. (2007). Computer technology in teaching and researching pronunciation. Annual Review of Applied Linguistics, 27: 1–19. http://dx.doi.org/10.1017/S0267190508070098

Levy, E. and Law II, F. (2010). Production of French vowels by American-English learners of French: Language experience, consonantal context, and the perception-production relationship. Journal of the Acoustical Society of America, 128 (3): 1290–1305. http://dx.doi.org/10.1121/1.3466879

Levy, E. and Strange, W. (2008). Perception of French vowels by American English adults with and without French language experience. Journal of Phonetics, 36 (1): 141–157. http://dx.doi.org/10.1016/j.wocn.2007.03.001

Littlewood, W. (2004). The task-based approach: Some questions and suggestions. English Language Teaching Journal, 58 (4): 319–326. http://dx.doi.org/10.1093/elt/58.4.319

Lu, M. (2008). Effectiveness of vocabulary learning via mobile phone. Journal of Computer Assisted Learning, 24 (6): 515–525. http://dx.doi.org/10.1111/j.1365-2729.2008.00289.x

MacDonald, D., Yule, G. and Powers, M. (1994) Attempts to improve English L2 pronunciation: The variable effects of different types of instruction. Language Learning, 44 (1): 75–100. http://dx.doi.org/10.1111/j.1467-1770.1994.tb01449.x

Mostow, J. and Aist, G. (1999). Giving help and praise in a reading tutor with imperfect listening because automated speech recognition means never being able to say you're certain. CALICO Journal 16 (3): 407–424.

Neri, A., Cucchiarini, C. and Strik, H. (2003). Automatic speech recognition for second language learning: how and why it actually works. Proceedings of 15th International Congress of Phonetic Sciences, 1157–1160, Barcelona, Spain.

Neri, A., Cucchiarini, C. and Strik, H. (2006). Selecting segmental errors in L2 Dutch for optimal pronunciation training. International Review of Applied Linguistics, 44 (4): 357–404. http://dx.doi.org/10.1515/IRAL.2006.016

Neri, A., Mich, O., Gerosa, M. and Giuliani, D. (2008). The effectiveness of computer assisted pronunciation training for foreign language learning by children. Computer Assisted Language Learning, 21 (5): 393–408. http://dx.doi.org/10.1080/09588220802447651

Nikolova, O. (2002). Effects of students’ participation in authoring of multimedia materials on student acquisition of vocabulary. Language Learning and Technology 6 (1): 100–122.

Nunan, D. (2004). Task-Based Language Teaching. Cambridge: Cambridge University Press. http://dx.doi.org/10.1017/CBO9780511667336

Rabiner, L. and Juang, B. (1993). Fundamentals of Speech Recognition. Upper Saddle River, NJ: Prentice Hall.

Rochet, B. (1995). Perception and production of Second-Language speech sounds by adults. In W. Strange (ed.), Speech Perception and Linguistic Experience: Theoretical and Methodological Issues, 379–410. Timonium, MD: York Press.

Rosa, E. and Leow, R. (2004). Computerized task-based exposure, explicitness, type of feedback, and Spanish L2 development. Modern Language Journal, 88 (2): 192–216. http://dx.doi.org/10.1111/j.0026-7902.2004.00225.x

Rosen, K. and Yampolsky, S. (2000). Automatic speech recognition and a review of its functioning with dysarthric speech. Augmentative and Alternative Communication, 16 (1): 48–

http://dx.doi.org/10.1080/07434610012331278904

Schwienhorst, K. (2008). Learner Autonomy and CALL Environments. New York: Routledge.

Sheldon, A. (1985). The relationship between production and perception of the /r/–/l/ contrast in Korean adults learning English: A reply to Borden, Gerber, and Milsark. Language Learning, 35 (1): 107–13. http://dx.doi.org/10.1111/j.1467-1770.1985.tb01018.x

Sheldon, A. and Strange, W. (1982). The Acquisition of /r/-/l/ by Japanese Learners of English: Evidence that Speech Production Can Precede Speech Perception. Applied Psycholinguistics, 3 (3): 243–261. http://dx.doi.org/10.1017/S0142716400001417

Stampe, D. (1973). A Dissertation in Natural Phonology. New York: Garland.

Strambi, A. (2001). The interaction of web-based interaction and collaboration on the language learner. Unpublished doctoral thesis, University of Sydney.

Strik, H., Truong, K., Wet, F. and Cucchiarini, C. (2009). Comparing different approaches for automatic pronunciation error detection, Speech Communication, 51 (10): 845–852. http://dx.doi.org/10.1016/j.specom.2009.05.007

Warschauer, M. (1996). Comparing face-to-face and electronic communication in the second language classroom. CALICO Journal 13 (2): 7–26.

Young, V. and Mihailidis, A. (2010). Difficulties in automatic speech recognition of dysarthric speakers and the implications for speech-based applications used by the elderly: a literature review. Assistive Technology Journal, 22 (2): 99–112. http://dx.doi.org/10.1080/10400435.2010.483646

Zhang, H., Song, W. and Burston, J. (2011). Reexamining the effectiveness of vocabulary learning via mobile phones. The Turkish Online Journal of Educational Technology, 10 (3): 203–221.

Downloads

Published

2014-12-08

Issue

Section

Articles

How to Cite

Liakin, D., Cardoso, W., & Liakina, N. (2014). Learning L2 pronunciation with a mobile speech recognizer: French /y/. CALICO Journal, 32(1), 1-25. https://doi.org/10.1558/cj.v32i1.25962