Traditional Versus ASR-Based Pronunciation Instruction

An Empirical Study


  • Christina Garcia Saint Louis University
  • Dan Nickolai Saint Louis University
  • Lillian Jones University of California, Davis



ASR, CAPT, TTS, pronunciation


This paper presents a 15-week classroom study measuring the student outcomes of instructor-led pronunciation lessons versus entirely ASR-based pronunciation training. Seventy-six second-semester Spanish language learners were divided into two groups, one experimental (n=44) and one control (n=32). Over the course of six modules, both groups completed a pre- and post-study recording, as well as explicit pronunciation training sessions. These sessions included pre- and post-recordings, with either traditional or ASR pronunciation practice in between, which aimed attention at targeted phonemes. All student recordings were evaluated by native and near-natives for comprehensibility, nativeness, fluency, and perceived confidence. The results show that the effect of explicit and ASR instruction varies depending on the module and characteristic evaluated. ASR seems to outperform traditional instruction when targeting specific phonemes, especially in the short-term, while the explicit instruction group saw longer-term gains in regards to comprehensibility. Holistically, the data suggest that ASR-based instruction shows promise to improve certain aspects of pronunciation, but that using both techniques in tandem would be the most strategic approach to handling the development of this fundamental aspect of learner speech. The data presented here highlight the role and effectiveness of computer-assisted pronunciation training for lower-level Spanish courses.

Author Biographies

Christina Garcia, Saint Louis University

Christina García, Assistant Professor of Spanish and Linguistics at Saint Louis University, is a sociolinguist and phonetician interested in phonetic variation, sociophonetic perception, and L2 pronunciation acquisition. She has done fieldwork in Argentina and Ecuador, examining how sounds are socially meaningful and contribute to the formation of regional identities, and her research on L2 pronunciation harnesses technological tools to provide diverse types of pronunciation feedback to learners. Her work has been published in journals such as Language Variation and Change and Studies in Hispanic and Lusophone Linguistics, and brings cutting-edge techniques used in sociophonetics to the forefront of Hispanic Linguistics.

Dan Nickolai, Saint Louis University

Dan Nickolai is an Assistant Professor of French and the Director of the Language Resource Center at Saint Louis University. He has educational and professional backgrounds in the fields of Computer Science and Second Language Acquisition. His current research and development efforts are focused on designing software platforms that automate the evaluation and instruction of second languages. He is a familiar face on the CALL conference circuit, and his tools have been used in over 50 countries by tens of thousands of language students and educators.

Lillian Jones, University of California, Davis

Lillian Jones is a doctoral student at the University of California, Davis, studying Hispanic Linguistics and Second Language Acquisition. Her research interests include the pedagogical applications of text messaging and social media, computer-mediated communication, computer-assisted language learning, and online and hybrid teaching. Lillian’s MA research paper explored the effect of text messaging on adult linguistic production. She has also published work regarding approaches to integrating emoji into L2 lessons, and is currently involved in providing curriculum and user experience support in the development of an open-source, digital vocabulary program.


Bates, D., Mächler, M., Bolker, B., & Walker, S. (2014). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1).

Beaufays, F. (2015, August 11) The neural networks behind Google Voice transcription [web log]. Retrieved

Derwing, T. M., & Munro, M.J. (2005). Second language accent and pronunciation teaching: A research-based approach. TESOL Quarterly, 39(3), 379–397.

Elimat, A. K., & AbuSeileek, A. F. (2014). Automatic speech recognition technology as an effective means for teaching pronunciation. The JALT CALL Journal, 10(1), 21-47.

Golonka, E. M., Bowles, A. R., Frank, V. M., Richardson, D. L., & Freynik, S. (2014). Technologies for foreign language learning: a review of technology types and their effectiveness. Computer assisted language learning, 27(1), 70-105.

Grant, L., & Brinton, D. (2014). Pronunciation myths: Applying second language research to classroom teaching. Ann Arbor, MI: University of Michigan Press.

Levis, J. (2007). Computer technology in teaching and researching pronunciation. Annual Review of Applied Linguistics, 27, 184-202.

Liakin, D., Cardoso, W., & Liakina, N. (2013). Mobile speech recognition software: A tool for teaching second language pronunciation. Cahiers De L’ilob, 5, 85-99.

Liakin, D., Cardoso, W., & Liakina, N. (2015). Learning L2 Pronunciation with a Mobile Speech Recognizer: French/y/. Calico Journal, 32(1), 1-25.

Liakin, D., Cardoso, W., & Liakina, N. (2017). The pedagogical use of mobile speech synthesis (TTS): focus on French liaison. Computer Assisted Language Learning, 30(3-4), 325-342.

Lord, G. (2019). Incorporating technology into the teaching of Spanish pronunciation. In R. Rao (Ed.), Key Issues in the Teaching of Spanish Pronunciation: from description to pedagogy (218-236). New York, NY: Routledge.

Morgan, Terrell A. (2010). Sonidos en contexto: una introduccio?n a la fone?tica del espan?ol con especial referencia a la vida real. New Haven, CT: Yale University Press.

Neri, A., Cucchiarini, C., & Strik, H. (2002a). Feedback in computer assisted pronunciation training: technology push or demand pull?. In Tan, Z., & Dalsgaard, P. (eds.) Proceedings of the International Conference on Spoken Language Processing (ICSLP), 1209-1212. Denver, CO.

Neri, A., Cucchiarini, C., & Strik, H. (2003). Automatic speech recognition for second language learning: how and why it actually works. In M.J. Solé, D. Recasens, & J. Romero (eds.), Proceedings from the 15th International Congress of Phonetic Sciences (ICPhS-15), 1157-1160. Barcelona, Spain.

Neri, A., Cucchiarini, C., Strik, H., & Boves, L. (2002b). The pedagogy-technology interface in computer assisted pronunciation training. Computer assisted language learning, 15(5), 441-467.

O’Brien, M. G., Derwing, T. M., Cucchiarini, C., Hardison, D. M., Mixdorff, H., Thomson, R. I., ... & Levis, G.M. (2018). Directions for the future of technology in pronunciation research and teaching. Journal of Second Language Pronunciation, 4(2), 182-207.

Olson, D. J. (2014). Benefits of visual feedback on segmental production in the L2 classroom. Language Learning & Technology, 18(3), 173–192. Retrieved from

Pieraccini, R. (2012). The voice in the machine: building computers that understand speech. Cambridge, MA: MIT Press.

R Development Core Team. (2014). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing.

Thomson, R. I. (2011). Computer assisted pronunciation training: Targeting second language vowel perception improves pronunciation. Calico Journal, 28(3), 744-765.

Thomson, R. I., & Derwing, T. M. (2014). The effectiveness of L2 pronunciation instruction: A narrative review. Applied Linguistics, 36(3), 326-344.



How to Cite

Garcia, C., Nickolai, D., & Jones, L. (2020). Traditional Versus ASR-Based Pronunciation Instruction: An Empirical Study. CALICO Journal, 37(3), 213–232.