The Corpus of Galicia / Spanish Bilingual Speech of the University of Vigo
Codes tagging and automatic annotation
Keywords:sociology, social psychology, anthropology, Sociolinguistics
Firstly, we present a brief explanation of this research project, the Corpus of Galician/Spanish Bilingual Speech (Corpus de Fala Bilingüe Galego/Castelán, abbreviated as CoFaBil), currently being complied at the University of Vigo. This ethnographicconversational based corpus has been recorded in a wide range of informal and spontaneous communicative situations, subsequently transcribed in detail with those conventions normally applied to conversation analysis. Secondly, we explain the manual annotation process of the corpus. The CHAT annotation system, applied in tagging this corpus, requires specifying the linguistic-communicative code to which each word belongs. So, we shall explain the problems to which this word by word tagging leads us. These problems cover phenomena characteristic of both bilingual conversation and languages in contact, but with the specificity that the scarce interlinguistic distance between the varieties of Galician and of Spanish call for adopting certain tagging values (presented in the text) that respond to the complex nature of the different phenomena detected. Thirdly, we present the solutions conceived for the automatic annotation of this corpus. The most important result is the computer application Anotador 1.0, which makes it possible to note down a substantial part of the phenomena appearing in the CoFaBil more speedily, while doing away with the interpretative biases involved in human annotating. Also, due to the versatility of this tool, it may be used as a corpora annotator of bilingual speech for any pair of languages.
How to Cite
© Equinox Publishing Ltd.
For information regarding our Open Access policy, click here.