Annotating thematic features in English and Spanish

A contrastive corpus-based study


  • Jorge Arús Universidad Complutense de Madrid
  • Julia Lavid Universidad Comlutense de Madrid
  • Lara Moratón Universidad Complutense de Madrid



annotation scheme, discourse analysis, Theme


In this paper we present the preliminary results of an empirical study designed to test contrastive features of the category of Theme in English and Spanish through corpus analysis and manual annotation. Using as our theoretical basis the more general features of the model of thematisation proposed in Lavid, Arús and Zamorano (2010), the study describes the different steps of the methodology used, starting with the selection of the corpus used as a ‘training suite’, followed by the design of the annotation scheme, and ending with a discussion of the results of two annotation experiments carried out so far to test the reproducibility of the annotation scheme. It is expected that the work reported in this paper has a theoretical impact on the area of contrastive corpus studies and serves as the basis for the (semi)-automatic annotation of thematic features in larger bilingual corpora.

Author Biographies

Jorge Arús, Universidad Complutense de Madrid

Jorge Arús Hita teaches English language and linguistics at the Facultad de Filología Inglesa, Universidad Complutense de Madrid. His publications include articles on contrastive linguistics and second-language teaching, within the systemic functional framework, in various national and international journals and edited volumes. He has been copy-editor of Atlantis and is currently b-learning coordinator at the School of Language of Linguistics, UCM.

Julia Lavid, Universidad Comlutense de Madrid

Julia Lavid is Full Professor in English Linguistics and Head of the Department of English Philology I, Universidad Complutense of Madrid (Spain), where she teaches several courses on English Linguistics, Computational and Corpus Linguistics, and the contrastive analysis and translation of English and Spanish. She has been team leader of several international projects financed by the European Commission and is now the team leader of a research group on Functional Linguistics and its Applications at UCM with the participation of both national and international researchers. Her research expertise focuses on functional and corpus-based approaches to the study of English in contrast with other languages, as well as their application to educational and computational contexts. Her most recent research focuses on the creation and validation of English-Spanish contrastive descriptions through corpus analysis and annotation, financed by the Spanish Ministry of Science Innovation within the CONTRANOT project. She has an extensive record of publications in international volumes and is the author of the book Lenguaje y Nuevas tecnologías: Nuevas Perspectivas, Métodos y Herramientas para el lingüista del siglo XXI (Madrid, Cátedra, 2005), and coauthor of the research monograph Systemic-Functional Grammar of Spanish: a Contrastive Account with English (London: Continuum, 2010).

Lara Moratón, Universidad Complutense de Madrid

Lara Moratón is affiliated to Universidad Complutense de Madrid.


Arnaiz, A. R. (1997) An overview of the main word order characteristics of Romance. In A. Siewierska (ed.) Constituent Order in the Languages of Europe, 47?73. Berlin: Mouton de Gruyter.

Arús, J. (2010) On Theme in English and Spanish: A comparative study. In E. Swain (ed.) Thresholds and Potentialities of Systemic Functional Linguistics: Multilingual, Multimodal and Other Specialised Discourses, 23?48. Trieste: EUT.

Arús, J. (2007) On the aboutness of Theme. In M. Losada, P. Ron, S. Hernández and J. Casanova (eds) Proceedings of the 30th International AEDEAN Conference (CD-ROM).

Berry, M. (1989) Thematic options and success in writing. In C. Butler, R. Cardwell and J. Cardwell (eds) Language and Literature: Theory and Practice. A Tribute to Walter Grauberg, 62?80. Nottingham: University of Nottingham.

Fawcett, R. (2007) The many types of ‘Theme’ in English: their semantic systems and their functional syntax. Retrieved on 10 June 2010 from

Halliday, M. A. K. and Matthiessen, C. M. I. M. (2004) Introduction to Functional Grammar. London: Arnold.

Hausser, R. (2001) Foundations of Computational Linguistics. Berlin: Springer.

Krippendorff, K. (2007) Computing Krippendorff’s Alpha-Reliability. Retrieved on 21 March 2010 from

Lavid, J. (2010) Contrasting choices in clause-initial position in English and Spanish: A corpus-based analysis. In E. Swain (ed.) Thresholds and Potentialities of Systemic Functional Linguistics: Multilingual, Multimodal and Other Specialised Discourses, 49?68. Trieste: EUT.

Lavid, J. (2000a) Contextual constraints on thematisation in written discourse: an empirical study. In P. Bonzon, M. Cavalcanti and R. Nossum (eds) Formal Aspects of Context, 37?47. Dordrecht/Boston/London: Kluwer Academic Publishers.

Lavid, J. (2000b) Text types, chaining strategies and Theme in a multilingual corpus: A cross-linguistic comparison for text generation. In J. Bregazzi, A. Downing, D. López and J. Neff (eds) Estudios de Filología Inglesa: Homenaje a Jack White 107?121. Madrid: Editorial Complutense.

Lavid, J. (1998) The relevance of corpus-based research for contrastive linguistics and computational studies: Thematisation as an example. In M. T. Turell and E. Vallduví (eds). IV i V Jornades de corpus lingüistics (1996–1997): els corpus en la recerca semàntica i pragmàtica, 117?140. Barcelona: Publicaciones del Instituto Universitario de Lingüística Aplicada, Universidad Pompeu Fabra.

Lavid, J., Arús, J. and Zamorano, J. R. (2010a) Systemic-Functional Grammar of Spanish: A Contrastive Account with English. London: Continuum.

Lavid, J., Arús, J. and Moratón, L. (2010b) Signalling genre through Theme: The case of news reports and commentaries. In L-M. Ho-Dac (ed.) Proceedings of the 8th MAD: Signalling Text Organisation, 82?92. Moissac (France): University of Toulousse. Available at

Leech, Geoffrey (1997) Introducing corpus annotation. In R. Garside, G. Leech and A. McEnery (eds) Corpus Annotation: Linguistic Information from Computer Text Corpora, 1?19. London: Longman.

Matthiessen, C. M. I. M. (1995) Lexicogrammatical Cartography: English Systems. Tokyo: International Language Science Publishers.

Matthiessen, Christian (2006) Frequency Profiles of some Basic Grammar Systems. In G. Thomson and S. Hunston (eds) System and Corpus: Exploring Connections 103–42. London: Equinox.

McCabe, A. and Alonso, I. (2001) Theme, transitivity and cognitive representation in Spanish and English written texts. In CLAC 7/2001. Retrieved on 10 February 2009 from

O’Donnell, M. (2010) UAM Corpus Tool. Available at

Ravelli, L. J. (1995) A dynamic perspective: implications for metafunctional interaction and an understanding of Theme. In R. Hasan and P. H. Fries (eds) On Subject and Theme, 187?234. Amsterdam and Philadelphia, PA: Benjamins.

Rose, D. (2001) Some variation in Theme across languages. Functions of Language 8 (1): 109?145.

Taboada, M. (1995) Theme Markedness in English and Spanish: A Systemic-Functional Approach. Retrieved on 24 September 2010 from University of Pittsburg (2010) UCAT Coding Tool. Available at http://cat.ucsur.pi



How to Cite

Arús, J., Lavid, J., & Moratón, L. (2012). Annotating thematic features in English and Spanish: A contrastive corpus-based study. Linguistics and the Human Sciences, 6(1-3), 173–192.



Thematic Structure and Meaning