Using the Developmental Path of Cause to Bridge the Gap between AWE Scores and Writing Teachers’ Evaluations


  • Hong Ma Iowa State University
  • Tammy Slater Iowa State University



score validation, AWE scores


Supported by artificial intelligence (AI), the most advanced Automatic Writing Evaluation (AWE) systems have gained increasing attention for their ability to provide immediate scoring and formative feedback, yet teachers have been hesitant to implement them into their classes because correlations between the grades they assign and the AWE scores have generally been low. This begs the question of where improvements in evaluation may need to be made, and what approaches are available to carry out this improvement. This mixed-method study involved 59 cause and effect essays collected from English language learners enrolled in six different sections of a college level academic writing course and utilized theory proposed by Slater and Mohan (2010) regarding the developmental path of cause. The study compared the results of raters who used this developmental path with the accuracy of AWE scores produced by Criterion, an AWE tool developed by Educational Testing Service (ETS), and the grades reported by teachers. Findings suggested that if Criterion is to be used successfully in the classroom, writing teachers need to take a meaning-based approach to their assessment, which would allow them and their students to understand more fully how language constructs cause and effect. Using the developmental path of cause as an analytical framework for assessment may then help teachers assign grades that are more in sync with AWE scores, which in turn can help students gain more trust in the scores they receive from both their teachers and Criterion.

Author Biographies

Hong Ma, Iowa State University

Hong Ma is a PhD candidate in Applied Linguistics and Technology at Iowa State University. Her primary research interests lay in computer-assisted language learning and language testing. She is currently leading multiple research projects, which intend to develop and evaluate a vocabulary- learning tool and extract a more pedagogy-informed vocabulary list using programming language.

Tammy Slater, Iowa State University

Tammy Slater is an associate professor in Applied Linguistics and Technology at Iowa State University. Her research draws upon Systemic Functional Linguistics to understand the development of academic language through content-based and project-based teaching and learning, particularly as it informs English language education.


Attali, Y., & Burstein, J. (2005). Automated essay scoring with e-rater version 2.0 (ETS RR-04-45). Princeton, NJ: Educational Testing Service.

Attali, Y., Bridgeman, B., & Trapani, C. (2010). Performance of a generic approach in automated essay scoring. The Journal of Technology, Learning and Assessment, 10(3), 1-17.

Ben-Simon, A., & Bennett, E.R. (2007). Toward more substantively meaningful automated essay scoring. The Journal of Technology, Learning and Assessment, 6(1), Retrieved from

Burstein, J., Kukich, K., Wolff, S., Lu, C., & Chodorow, M. (1998). Computer analysis of essays. Retrieved from

Burstein, J., & Chodorow, M. (1999, June). Automated essay scoring for nonnative English speakers. In Proceedings of the ACL99 Workshop on Computer-Mediated Language Assessment and Evaluation of Natural Language Processing. Retrieved from http://www.

Burstein, J., Chodorow, M., & Leacock, C. (2003). Criterion online essay evaluation: An application for automated evaluation of student essays. Retrieved from

Chapelle, C.A. & Chung, Y. (2010). The promise of NLP and speech processing technologies in language assessment. Language Testing, 27(3), 301-315.

Chapelle, C.A., Jamieson, J. & Enright, M.K. (Eds.). (2008). Building a validity argument for the test of English as a foreign language. London: Routledge.

Chen, C.-F.E., & Cheng, W.-Y.E. (2008). Beyond the design of automated writing evaluation: Pedagogical practices and perceived learning effectiveness in EFL writing classes. Language Learning & Technology, 12(2), 94–112.

Coffin, C. (1997). Constructing and giving value to the past: An investigation into secondary school history. In F. Christie & J.R. Martin (Eds.), Genre and Institutions: Social Processes in the Workplace and School (pp. 196-230). London: Continuum.

Creswell, J.W., & Plano Clark, V.L. (2007). Designing and conducting mixed methods research. Thousand Oaks, CA: SAGE Publications.

Ebyary, K., & Windeatt, S. (2010). The impact of computer-based feedback on students’ written work, International Journal of English Studies, 10 (2), 121-142.

Elliot, S. (2001). IntelliMetric: From here to validity. Paper presented at the annual meeting of the American Educational Research Association. Seattle, Washington.

Fitzpatrick, M. (2011). Engaging writing 2: Essential skills for academic writing (2nd ed.). New York: Pearson Longman.

Grimes, D., & Warschauer, M. (2010). Utility in a fallible tool: A multi-site case study of automated writing evaluation. Journal of Technology, Learning, and Assessment, 8(6), 4–44.

Halliday, M.A.K. (1994). An introduction to functional grammar (2nd ed.). New York, NY: Edward Arnold.

Halliday, M.A.K. (1998). Things and relations: Regrammaticising experience as technical knowledge. In J.R. Martin and R. Veel (Eds.), Reading science: Critical and functional perspectives on discourses of science (pp. 185-235). New York: Routledge.

Halliday, M.A.K., & Martin, J.R. (1993). Writing science: Literacy and discursive Power. Washington DC: The Falmer Press.

James, C. (2006). Validating a computerized scoring system for assessing writing and placing students in composition courses. Assessing Writing, 11(3), 167-178.

Keith, T.Z. (2003). Validity of automated essay scoring systems.?In M.D. Shermis & J.C. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective. Mahwah, NJ: Erlbaum.

Lai, Y.-H. (2010). Which do students prefer to evaluate their essays: Peers or computer program. British Journal of Educational Technology, 41 (3), 432-454.

Li, Z., Link, S., Ma, H., Yang, H., & Hegelheimer, V. (2014). The role of automated writing evaluation holistic scores in the ESL classroom. SYSTEM Journal, 44, 66-78.

Link, S., Durson, A., Karakaya, K., & Hegelheimer, V. (2014). Towards best ESL practices for implementing automated writing evaluation. CALICO Journal, 31 (3), 323-344.

Mohan, B., Leung, C., & Slater, T. (2010). Assessing language and content: A functional perspective. In A. Paran & L. Sercu (Eds.), Testing the untestable in language education (pp. 219-242). Bristol, UK: Multilingual Matters.

Mohan, B., & Slater, T. (2004). The evaluation of causal discourse and language as a resource for meaning. In J. A. Foley. (Ed.), Language, education & discourse: Functional approaches (pp. 255-269). London: Continuum.

Mohan, B., & Slater, T. (2005). A functional perspective on the critical ‘theory/practice’ relation in teaching language and science. Linguistics and Education, 16, 151-172.

Mohan, B., Slater, T., Luo, L., & Jaipal, K. (2002). Developmental lexicogrammar of causal explanations in science. Paper presented at the International Systemic Functional Linguistics Congress (ISFC29), Liverpool, UK.

Page, E.B. (2003). Project Essay Grade: PEG. In M.D. Shermis & J.C. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. 43-54). Mahway, NJ: Lawrence Erlbaum Associates.

Page, E.B., & Peterson, N.S. (1995). The computer moves into essay grading: Updating the ancient test. The Phi Delta Kappan, 76 (7), 561-565.

Painter, C. (1999). Learning through language in early childhood. London: Continuum.

Petersen, N.S. (1997) Automated scoring of written essays: Can such scores be valid? Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago.

Saldana, J. (2009). The coding manual for qualitative researchers. Washington DC: SAGE.

Slater, T. (2004). The discourse of causal explanations in school science. PhD thesis, University of British Columbia.

Slater, T., & Mohan, B. (2010). Towards systematic and sustained formative assessment of causal explanations in oral interactions. In A. Paran & L. Sercu (Eds.), Testing the untestable in language education (pp. 256-269). Bristol, UK: Multilingual Matters.

Taylor, C., Kirsch, I., & Eignor, D. (1999). Examining the relationship between computer familiarity and performance on computer-based language tasks. Language Learning, 49(2), 219-274.

Veel, R. (1997). Learning how to mean-scientifically speaking: Apprenticeship into scientific discourse in the secondary school. In F. Christie & J.R. Martin (Eds.), Genre and Institutions: Social Processes in the Workplace and School (pp. 161-195). London: Continuum.

Wang, J., & Brown, M.S. (2007). Automated essay scoring versus human scoring: a comparative study. Journal of Technology, Learning, and Assessment, 6(2). Retrieved from

Wang, Y.-J., Shang, H.-F., & Briody, P. (2013). Exploring the implact of using automated writing evaluation in English as a foreign language university students’ writing. Computer Assisted Language Learning, 26 (3), 234-257.

Warschauer, M. (2010). Invited commentary: New tools for teaching writing. Language Learning & Technology, 14(1), 3-8.



How to Cite

Ma, H., & Slater, T. (2015). Using the Developmental Path of Cause to Bridge the Gap between AWE Scores and Writing Teachers’ Evaluations. Writing and Pedagogy, 7(2-3), 395-422.



From the e-Sphere