Concgrams and Writing Quality in Test Compositions


  • Yushan Ke SooChow University



phraseology, concgram, corpora, meaning-shift units


Phraseology has been flourishing in the field of English writing studies in recent years. However, the focus has primarily been on items with less variability, such as ngrams or lexical bundles. To address this gap, this study investigates concgrams (Cheng et al., 2006), which encompass both constituency and positional variations, in advanced General English Proficiency Test (GEPT) writings. One hundred compositions from the GEPT were divided into two proficiency groups based on their scores and analyzed using the corpus tool ConcGram 1.0. The phraseological characteristics of concgrams are explored from four perspectives: frequency, type-token ratios (TTRs), word associations, and configuration. The goal is to determine the relevance of the use of concgrams to writing evaluation. The results indicate that TTRs and configuration play minor roles, while frequency and word associations appear to be more relevant to excellent writing. This study highlights the importance of including more variable phraseological units and provides a new approach to investigating the presence of phraseological units in greater depth.

Author Biography

  • Yushan Ke, SooChow University

    Yushan Ke is an Assistant Professor in the Language Centre at Soochow University, Taiwan. Her research interests include corpus linguistics, second language acquisition, English writing, English for specific purposes and English for academic purposes, content and language integrated learning, and teaching English as a foreign language.


Ahmad, K. (2005). Terminology in texts. The Tuscan Word Centre International Workshop: Dial a Corpus, Certosa di Pontignano, Italy.

Ang, L. H., & Tan, K. H. (2019). From lexical bundles to lexical frames: Uncovering the extent of phraseological variation in academic writing. 3L: Language, Linguistics, Literature, 25(2), 99–112.

Anthony, L. (2020). AntGram (Version 1.2.2.) [Computer Software]. Tokyo, Japan: Waseda University. Available from

Biber, D. (2009). A corpus-driven approach to formulaic language in English: Multi-word patterns in speech and writing. International Journal of Corpus Linguistics, 14(3), 275–311.

Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman grammar of spoken and written English (hard copy). Harlow: Pearson Education.

Brezina, V., Weill-Tessier, P., & McEnery, A. (2020). LancsBox v. 5.x. (5.x.) [Computer Software]. Lancaster University.

Cheng, W. (2007). Concgramming: A corpus-driven approach to learning the phraseology of discipline-specific texts. CORELL: Computer Resources for Language Learning, 1, 22–35.

Cheng, W. (2010). Hong Kong Engineering Corpus: Empowering professionals-in-training to learn the language of their profession. In M. C. Campoy, B. Belles-Fortuno, & M. L. Gea-Valor (Eds.), Corpus-based approaches to English language teaching (pp. 67–78). London: Bloomsbury Publishing.

Cheng, W., & Leung, S. N. M. (2012). Exploring phraseological variations by concgramming: The realization of complete patterns of variations. Linguistic Research, 29(3), 617–638.

Cheng, W., Greaves, C., & Warren, M. (2006). From n-gram to skipgram to concgram. International Journal of Corpus Linguistics, 11(4), 411–435.

Cheng, W., Greaves, C., Sinclair, J., & Warren, M. (2009). Uncovering the extent of the phraseological tendency: Towards a systematic analysis of concgrams. Applied Linguistics, 30(2), 236–252.

Christiansen, M., & Arnon, I. (2017). More than words: The role of multiword sequences in language learning and use. Topics in Cognitive Science, 9(3), 542–551.

Cortes, V. (2004). Lexical bundles in published and student disciplinary writing: Examples from history and biology. English for Specific Purposes, 23(4), 397–423.

Cowie, A. (Ed.). (2005). Phraseology: Theory, analysis, and applications. Oxford: Oxford University Press.

Cowie, A. P., & Howarth, P. (1996). Phraseological competence and written proficiency. British Studies in Applied Linguistics, 11, 80–93.

Coxhead, A. (2008). Phraseology and English for academic purposes. In F. Meunier & S. Granger (Eds.), Phraseology in language learning and teaching (pp. 149–161). Amsterdam: John Benjamins.

Crossley, S., Salsbury, T., & McNamara, D. (2012). Predicting the proficiency level of language learners using lexical indices. Language Testing, 29(2), 243–263.

Cumming, A., Kantor, R., Baba, K., Erdosy, U., Eouanzoui, K., & James, M. (2005). Differences in written discourse in independent and integrated prototype tasks for next generation TOEFL. Assessing Writing, 10(1), 5–43.

De Cock, S., Granger, S., Leech, G., & McEnery, T. (1998). An automatic approach to the phrasicon of EFL learners. In S. Granger (Ed.), Learner English on computer (pp. 67–79). London: Longman.

Durrant, P., & Schmitt, N. (2009). To what extent do native and non-native writers make use of collocations? International Review of Applied Linguistics in Language Teaching, 47(2), 157–177.

Garner, J., Crossley, S., & Kyle, K. (2019). N-gram measures and L2 writing proficiency. System, 80, 176–187.

Gebril, A., & Plakans, L. (2016). Source-based tasks in academic writing assessment: Lexical diversity, textual borrowing and proficiency. Journal of English for Academic Purposes, 24, 78–88.

González, M. (2017). The contribution of lexical diversity to college-level writing. TESOL Journal, 8(4), 899–919.

Granger, S., & Bestgen, Y. (2014). The use of collocations by intermediate vs. advanced non-native writers: A bigram-based study. International Review of Applied Linguistics in Language Teaching, 52(3).

Granger, S., & Paquot, M. (2008). Disentangling the phraseological web. In S. Granger & F. Meunier (Eds.), Phraseology: An interdisciplinary perspective (pp. 27–50). Amsterdam: John Benjamins.

Grant, L., & Ginther, A. (2000). Using computer-tagged linguistic features to describe L2 writing differences. Journal of Second Language Writing, 9(2), 123–145.

Greaves, C. (2009). ConcGram 1.0: A phraseological search engine. Amsterdam: John Benjamins Publishing Company.

Greaves, C., & Warren, M. (2007). Concgramming: A computer driven approach to learning the phraseology of English. ReCALL, 19(3), 287–306.

Gregori-Signes, C., & Clavel-Arroitia, B. (2015). Analysing lexical density and lexical diversity in university students’ written discourse. Procedia—Social and Behavioral Sciences, 198, 546–556.

Howarth, P. (1998). Phraseology and second language proficiency. Applied Linguistics, 19(1), 24–44.

Howarth, P. (2005). The phraseology of learners’ academic writing. In A. Cowie (Ed.), Phraseology: Theory, analysis, and application (pp. 161–188). Oxford: Oxford University Press.

Hyland, K. (2008). As can be seen: Lexical bundles and disciplinary variation. English for Specific Purposes, 27(1), 4–21.

Hyland, K. (2012). Bundles in academic discourse. Annual Review of Applied Linguistics, 32, 150–169.

Kyle, K., & Crossley, S. (2016). The relationship between lexical sophistication and independent and source-based writing. Journal of Second Language Writing, 34, 12–24.

Lenko-Szymanska, A. (2014). The acquisition of formulaic language by EFL learners: A cross-sectional and cross-linguistic perspective. International Journal of Corpus Linguistics, 19(2), 225–251.

Paquot, M. (2018). Phraseological competence: A missing component in university entrance language tests? Insights from a study of EFL learners’ use of statistical collocations. Language Assessment Quarterly, 15(1), 29–43.

Paquot, M. (2019). The phraseological dimension in interlanguage complexity research. Second Language Research, 35(1), 1–25.

Pawley, A., & Syder, F. (1983). Two puzzles for linguistic theory: Nativelike selection and nativelike fluency. In J. C. Richard & R. W. Schmidt (Eds.), Language and Communication (pp. 191–226). London: Longman.

Ramisch, C., Villavicencio, A., & Boitet, C. (2010). mwetoolkit: A framework for multiword expression identification. Proceedings of the International Conference on Language Resources and Evaluation, 17–23 May, Valletta, Malta.

Rayson, P. (2010). Log-likelihood wizard. Log-likelihood calculator.

Read, J. (2000). Assessing vocabulary. Cambridge: Cambridge University Press.

Read, J. (2004). Plumbing the depths: How should the construct of vocabulary knowledge be defined. In P. Bogaards & B. Laufer (Eds.), Vocabulary in a second language: Selection, acquisition, and testing (pp. 209–227). Amsterdam: John Benjamins.

Richards, B. (1987). Type/token ratios: What do they really tell us? Journal of Child Language, 14(2), 201–209.

Römer, U. (2010). Establishing the phraseological profile of a text type: The construction of meaning in academic book reviews. English Text Construction, 3(1), 95–119.

Rubin, R. (2019). Phraseological complexity as an index of L2 Dutch writing proficiency. Vocab@Leuven2019 Conference, 1–3 July, Leuven.

Salazar, L., & Joy, D. (2011). Lexical bundles in scientific English: A corpus-based study of native and non-native writing. TDX (Tesis Doctorals en Xarxa), University of Barcelona.

Schmidt, R. (2001). Attention. In P. Robinson (Ed.), Cognition and second language instruction (pp. 3–32). Cambridge: Cambridge University Press.

Sinclair, J. (1991). Corpus, concordance, collocation. Oxford: Oxford University Press.

Sinclair, J. (1996). The search for units of meaning. Textus, 9(1), 75–106.

Sinclair, J. (2004). Trust the text: Language, corpus and discourse. London: Routledge.

Sinclair, J. (2005). Document relativity [Unpublished manuscript].

Sinclair, J. (2007). Collocation reviewed [Unpublished Manuscript]. Tuscan Word Centre, Italy.

Sinclair, J. (2007b). Defining the definiendum—new. Unpublished manuscript.

Treffers-Daller, J., Parslow, P., & Williams, S. (2018). Back to basics: How measures of lexical diversity can help discriminate between CEFR levels. Applied Linguistics, 39(3), 302–327.

Vandeweerd, N., Housen, A., & Paquot, M. (2019). Phraseological complexity as an index of L2 French writing proficiency. 5th Learner Corpus Research Conference, 12–14 September, Warsaw.

Vetchinnikova, S. (2019). Phraseology and the advanced language learner. New York: Cambridge University Press.

Vidakovic, I., & Barker, F. (2010). Use of words and multi-word units in skills for life writing examinations. University of Cambridge ESOL Research Notes, 41, 7–14.

Wang, X. (2014). The relationship between lexical diversity and EFL writing proficiency. University of Sydney Papers in TESOL, 9.

Warren, M. (2011). Using corpora in the learning and teaching of phraseological variation. In A. Frankenberg-Garcia, L. Flowerdew, & G. Aston (Eds.), New trends in corpora and language learning (pp. 153–166). London: Bloomsbury Publishing.

Yu, G. (2010). Lexical diversity in writing and speaking task performances. Applied Linguistics, 31(2), 236–259.

Zipf, G. K. (1949). Human behaviour and the principle of least-effort. Cambridge: Addison-Wesley.






How to Cite

Ke, Y. (2024). Concgrams and Writing Quality in Test Compositions. CALICO Journal, 41(2), 188-208.