This journal is devoted to exploring how quantitative methods and statistical techniques can supplement qualitative analyses in linguistics and communication science; research on the quantitative characteristics of language and text in a more mathematical form. 

Preliminaries

William J. Boone, John R. Staver and Melissa S. Yale. Rasch Analysis in the Human Sciences

William J. Boone, John R. Staver and Melissa S. Yale. Rasch Analysis in the Human Sciences. The Netherlands: Springer, 2014. xvi+482 pp. ISBN 978-94-007-6856-7. Eckert, P. Meaning and Linguistic Variation: The Third Wave in Sociolinguistics

Eckert, P. (2018). Meaning and Linguistic Variation: The Third Wave in Sociolinguistics. Cambridge: Cambridge University Press. ISBN: 9781316403242

Exploring Meta-analysis for Historical Corpus Linguistics Based on Linked Data

Empirical work on English historical corpus linguistics is plentiful but fragmented, and some of it is hard to come by. This paper proposes a solution for making it more accessible and reusable for meta-analysis. We present an online Language Change Database (LCD), which provides comparative, real-time baseline data from earlier corpus-based studies. LCD entries summarize the findings and include numerical data from the articles. We discuss the LCD from the perspective of database design and linked data management. Furthermore, we illustrate the reuse of LCD data through a meta-analysis of the history of English connectives. For this purpose, we have developed an application called the LCD Aggregated Data Analysis workbench (LADA). We show how researchers can use LADA to filter, refine and visualize LCD data. Thus we are paving the way for a future where both research results and research data are regularly available for verification, validation and re-use. The Effect of Noun Phrase Grammar on the Affective Meaning of Social Identity Concepts

We examine the influences of determiners (a/an, the, and all) and grammatical number (singular or plural) on the affective meaning of social identity concepts. Some linguistic evidence suggests that changes in the grammatical form of a noun phrase may shift its affective meaning, while other research highlights the importance of context for such shifts. We conceptualize and measure affective meaning in terms of evaluation (goodness), potency, and activity drawn from research in affect control theory (ACT), a social psychological theory of culture and language. In two experiments, participants rate 28 social identity concepts, which are either count or collective nouns, presented in one of five grammatical forms. In congruence with ACT, the data support that the bulk of a concept's affective meaning is carried by the noun itself, rather than by the grammatical features of the noun phrase in which the concept is expressed. The Discriminatory Power of Lexical Context for Alternations:

This paper makes a very exploratory, tentative, and thinking-aloud kind of suggestion for the corpus-based analysis of alternation data. I start from the observation that studies of alternations/choices in particular in corpus linguistics have become increasingly sophisticated in terms of the statistical methods they employ and the number of predictors they involve. While the predictors employed come from many different levels of linguistic analysis – phonology, morphosyntax, semantics, pragmatics/ discoursal, textual, psycholinguistic, sociolinguistic, and others – they are usually contextual in nature, meaning they characterize the context of the choice the language user needs to make or has just made. However, one aspect of the context seems to be crucially underutilized when it comes to modeling speakers' choices: the lexical context. In this paper, I build on recent work in computational psycholinguistics to: (a) define a lexical-distribution prototype of each of the (typically, but not necessarily, two) alternants of an alternation; and (b) compute the degree to which each instance of the alternation in question diverges from each of the prototypes. Then, (c) the values that all choices score on the divergences from each of the prototypes are entered as predictors to all others in statistical models to, minimally, serve as a variable that controls for whatever information is contained in the lexical context of an instance of speaker's choice. I exemplify the approach and its sometimes amazing predictive power on the basis of a choice between near synonyms, two morphosyntactic alternations (preposition stranding vs. pied-piping and of- vs. s genitives), and a distinction between the functions of well. Adjusting Regression Models for Overfitting in Second Language Research

Regression modeling is an increasingly important quantitative tool for second language (L2) research. While superior in many ways to more traditional methods, such as ANOVA, regression modeling, like all procedures, still has limitations, ranging from small sample sizes to a lack of screening for outliers and influential data points (Plonsky and Ghanbar, 2018). Since these limitations are common features in L2 research, this raises concerns that existing studies using regression may overfit the data, perhaps inflating effect size estimates. These issues can be partially alleviated via robust statistics, such as validation. This paper provides L2 researchers with an overview of these issues and an instructive look at one robust validation method: bootstrapping. A Case Study on Some Frequent Concepts in Works of Poetry

This paper looks at a corpus of British and US poetry, uncovering phraseological units which, through their frequency, are indicators of key concepts. Multi-word-units (MWUs) have been discussed extensively with reference to corpus-based research, for example by Sinclair (1996) [2004], Biber and Conrad (1999), or, referred to as formulaicity by Wray (2002); O'Keefe et al. (2007), Greaves and Warren (2010) and Pace-Sigge (2015) describe MWUs preferred in different spoken and written genres. So far, however, there has been very little research in how far MWUs appear in the genre of poetry. A commonly held view is that poetry by definition should not be yielding patterns – it subverts every pattern (linguistically speaking) that it can. Through focus on the main themes surfacing in multiword units, this research looks at usages found in poetic texts in-depth and compares sets of words found with their occurrence patterns in prose literature. Key issues will be highlighted through a number of theme-based case studies, looking at themes of world and sky. Results show that there are common clusters found in poetry and prose corpora: itis depth of usage that marks their divergence. How to Set Delta in the Two-One-Sided T-tests Procedure (TOST)

The Two-One-Sided T-test procedure (TOST) is used to show that two samples are equivalent or similar, in contrast to classical statistical tests which check for dissimilarity. The TOST relies on a parameter called delta, which has to be set by the researcher using their intuition. Doing so can be difficult, because of complex interactions of relevant parameters. In this article we present a method to set delta, which is established and validated through extensive simulations based on real data sets from linguistics and other sciences. The presented method is shown to be sound and reliable, but we cannot exclude deviant early model behaviour (N?10) and deviant late model behaviour (N&gt;100,000). Analysis of the Production of Pronominal Constructions in Spanish in a Learner Corpus

It is a well-established fact that pronominal constructions are one of the most difficult areas for learners of Spanish as a second language (Tremblay, 2006, Toth, 2000). This study aims to contribute to the area of language learning and materials design by drawing a general picture of the production of pronominal constructions applying corpus linguistics methodology, following the recommendations of Zyzick (2006). The analysis has been carried out taking into consideration three aspects: the students' L1, the semantics of each construction, and the type of errors found. In order to conduct our study, a corpus of 2,532 real pronominal sentences (written by learners of Spanish from levels B1 and C1) was compiled. Findings show that students with a Romance L1 do not perform better than students with a non-Romance L1. Also, the semantics of the different types of pronominal constructions seem to have an influence on the students' performance. Finally, errors of overgeneralization are more common than errors of omission. In addition, the latter decrease in higher levels whereas the former seem to increase. These results may be used to reconsider some aspects of the teaching of Spanish as a foreign language. This technique also provides exciting opportunities for researchers to ask new questions that could not be addressed in a straightforward manner with traditional statistics. With this technique, researchers are able to investigate differential effects of a predictor on different outcomes. Through a demonstration in R using published, open eye-tracking data, I contextualize my discussion of the technique, offering also practical, step-by-step, and annotated guidelines for interested researchers.</p> 2019-12-19T00:00:00+00:00 Copyright (c) 2019 Equinox Publishing Ltd.