Exploring Meta-analysis for Historical Corpus Linguistics Based on Linked Data


  • Joonas Kesäniemi University of Helsinki
  • Turo Vartiainen University of Helsinki
  • Tanja Säily University of Helsinki
  • Terttu Nevalainen University of Helsinki




corpus linguistics, meta analysis, open data


Empirical work on English historical corpus linguistics is plentiful but fragmented, and some of it is hard to come by. This paper proposes a solution for making it more accessible and reusable for meta-analysis. We present an online Language Change Database (LCD), which provides comparative, real-time baseline data from earlier corpus-based studies. LCD entries summarize the findings and include numerical data from the articles. We discuss the LCD from the perspective of database design and linked data management. Furthermore, we illustrate the reuse of LCD data through a meta-analysis of the history of English connectives. For this purpose, we have developed an application called the LCD Aggregated Data Analysis workbench (LADA). We show how researchers can use LADA to filter, refine and visualize LCD data. Thus we are paving the way for a future where both research results and research data are regularly available for verification, validation and re-use.


Primary Sources

Rissanen, M. (2002). Despite or notwithstanding? On the development of concessive prepositions in English. In A. Fischer, G. Tottie, and H. M. Lehmann (Eds) Text Types and Corpora: Studies in Honour of Udo Fries, 191-203. Tübingen: Gunter Narr.

Rissanen, M. (2005). The development of till and until in English. In J. Fisiak and H.-K. Kang (Eds.) Recent Trends in Medieval English Language and Literature in Honour of Young-Bae Park, Vol. I 75-92. Seoul: Thaehaksa.

Rissanen, M. (2009). Grammaticalisation, contact and adverbial connectives: the rise and decline of save. In S. Watanabe and Y. Hosoya (Eds) English Philology and Corpus Studies: A Festschrift in Honour of Mitsunori Imai to Celebrate His Seventieth Birthday 135-152. Tokyo: Shohakusha.

Rissanen, M. (2011). On the long history of English adverbial subordinators. In A. Meurman-Solin and U. Lenker (Eds) Connectives in Synchrony and Diachrony in European Languages (Studies in Variation, Contacts and Change in English 8). Helsinki: VARIENG. Retrieved on 12 April 2018 from http://www.helsinki.fi/varieng/series/volumes/08/rissanen/

Rissanen, M. (2012). Grammaticalisation, contact and corpora: on the development of adverbial connectives in English. In I. Heged?s and A. Fodor (Eds) English Historical Linguistics 2010: Selected Papers from the Sixteenth International Conference on English Historical Linguistics (ICEHL 16), Pécs, 23-27 August 2010 131-152. Amsterdam: John Benjamins. https://doi.org/10.1075/cilt.325.06ris

Rissanen, M. (2014a) From medieval to modern: on the development of the adverbial connective considering (that). In I. Taavitsainen, M. Kytö, C. Claridge, and J. Smith (Eds) Developments in English: Expanding Electronic Evidence 98-115. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9781139833882.009

Rissanen, M. (2014b) On English historical corpora, with notes on the development of adverbial connectives. In A. Sintes and S. Hernández (Eds) Diachrony and Synchrony in English Corpus Linguistics 109-140. Bern: Peter Lang.

Secondary sources

Battle, S., Wood, D., Leigh, J., and Ruth, L. (2012). The Callimachus Project: RDF as a web template language. In J. F. Sequeda, A. Harth, and O. Hartig (Eds) Proceedings of the Third International Conference on Consuming Linked Data 1-14. CEUR-WS.org. urn:nbn:de:0074-905-3

Berners-Lee, T. (2006). Linked data. In Design Issues: Architectural and Philosophical Points. Retrieved on 12 April 2018 from https://www.w3.org/DesignIssues/LinkedData.html https://doi.org/10.1145/147126.147133

Bizer, C., Heath, T. and Berners-Lee, T. (2009). Linked data - the story so far. International Journal on Semantic Web and Information Systems 5 (3): 1-22. https://doi.org/10.4018/jswis.2009081901

Blythe, R. A. and Croft, W. (2012). S-curves and the mechanisms of propagation in language change. Language 88 (2): 269-304. https://doi.org/10.1353/lan.2012.0027

Chaudron, C. (2006) Some reflections on the development of (meta-analytic) synthesis in second language research. In J. M. Norris and L. Ortega (Eds) Synthesizing Research on Language Learning and Teaching 323-339. Amsterdam: John Benjamins. https://doi.org/10.1075/lllt.13.17cha

Chiarcos, C., Hellmann, S., and Nordhoff, S. (2011). Towards a Linguistic Linked Open Data cloud: the Open Linguistics Working Group. Traitement Automatique des Langues 52 (3): 245-275.

Chiarcos, C., McCrae, J., Cimiano, P., and Fellbaum, C. (2013). Towards open data for linguistics: Linguistic linked data. In A. Oltramari, P. Vossen, L. Qin, and E. Hovy (Eds) New Trends of Research in Ontologies and Lexical Resources 7-25. Berlin: Springer. https://doi.org/10.1007/978-3-642-31782-8_2

Durrant, P. (2014). Corpus frequency and second language learners' knowledge of collocations. International Journal of Corpus Linguistics 19(4): 443-477. https://doi.org/10.1075/ijcl.19.4.01dur

Flanagan, J. (2017). Reproducible research: Strategies, tools, and workflows. In T. Hiltunen, J. McVeigh and T. Säily (Eds) Big and Rich Data in English Corpus Linguistics: Methods and Explorations (Studies in Variation, Contacts and Change in English 19). Helsinki: VARIENG. Retrieved on 12 April 2018 from http://www.helsinki.fi/varieng/series/volumes/19/flanagan/

Francis, W. and Ku?era, F. (1979). Manual of Information to Accompany A Standard Corpus of Present-Day Edited American English, for Use with Digital Computers. Providence, RI: Department of Linguistics, Brown University.

HC = Helsinki Corpus of English Texts (1991) Compiled by M. Rissanen (Project leader), M. Kytö (Project secretary); L. Kahlas-Tarkka, M. Kilpiö (Old English); S. Nevanlinna, I. Taavitsainen (Middle English); T. Nevalainen, H. Raumolin-Brunberg (Early Modern English). Helsinki: Department of Modern Languages, University of Helsinki.

Johansson, S., Leech, G., and Goodluck, H. (1978) Manual of Information to Accompany the Lancaster-Oslo/Bergen Corpus of British English, for Use with Digital Computers. Oslo: Department of English, University of Oslo.

Kalampokis, E., Nikolov, A., Haase, P., Cyganiak, R., Stasiewicz, A., Karamanou, A., Zotou, M., Zeginis, D., Tambouris, E., and Tarabanis, K. A. (2014) Exploiting linked data cubes with OpenCube Toolkit. In M. Horridge, M. Rospocher, and J. van Ossenbruggen (Eds.) International Semantic Web Conference (Posters & Demos) 137-140. CEUR-WS.org. urn:nbn:de:0074-1272-7

Kesäniemi, J., Vartiainen, T., Säily, T., and Nevalainen, T. (2018). Open science for English historical corpus linguistics: Introducing the Language Change Database. In E. Mäkelä, M. Tolonen, and J. Tuominen (Eds) Proceedings of the Digital Humanities in the Nordic Countries 3rd Conference, Helsinki, Finland, March 7-9, 2018 (CEUR Workshop Proceedings 2084) 51-62. CEUR-WS.org. Retrieved on 16 November 2018 from http://ceur-ws.org/Vol-2084/paper4.pdf

Lijffijt, J., Säily, T., and Nevalainen, T. (2012) CEECing the baseline: Lexical stability and significant change in a historical corpus. In J. Tyrkkö, M. Kilpiö, T. Nevalainen, and M. Rissanen (Eds) Outposts of Historical Corpus Linguistics: From the Helsinki Corpus to a Proliferation of Resources (Studies in Variation, Contacts and Change in English 10). Helsinki: VARIENG. Retrieved on 12 April 2018 from http://www.helsinki.fi/varieng/series/volumes/10/lijffijt_saily_nevalainen/

Lohmann, S., Negru, S., Haag, F., and Ertl, T. (2016). Visualizing ontologies with VOWL. Semantic Web 7 (4): 399-419. https://doi.org/10.3233/SW-150200

McCrae, J., Chiarcos, C., Bond, F., Cimiano, P., Declerck, T., de Melo, G., Gracia, J., Hellmann, S., Klimek, B., Moran, S., and Osenova, P. (2016). The Open Linguistics Working Group: developing the Linguistic Linked Open Data cloud. In N. Calzolari, K. Choukri, T. Declerck, S. Goggi, M. Grobelnik, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk and S. Piperidis (Eds) Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016) 2435-2441. Paris: ELRA.

McCrae, J., Fellbaum, C., and Cimiano, P. (2014) Publishing and linking WordNet using Lemon and RDF. In C. Chiarcos, J. McCrae, P. Osenova and C. Vertan (Eds) Proceedings of the 3rd Workshop on Linked Data in Linguistics. urn:nbn:de:0070-pub-27327797

Meroño-Peñuela, A., Ashkpour, A., Rietveld, L., Hoekstra, R., and Schlobach, S. (2012). Linked humanities data: the next frontier? A case-study in historical census data. In T. Kauppinen, L. C. Pouchard, and C. Keßler (Eds) Proceedings of the 2nd International Workshop on Linked Science 2012 (LISC2012). CEUR-WS.org. urn:nbn:de:0074-951-6

Nevalainen, T., Säily, T., Vartiainen, T., Liimatta. A. and Lijffijt, J. (in preparation). History of English as punctuated equilibria? Journal of Historical Sociolinguistics.

Nevalainen, T., Vartiainen, T., Säily, T., Kesäniemi, J., Dominowska, A., and Öhman, E. (2016). Language Change Database: A new online resource. ICAME Journal 40: 77-94. doi:10.1515/icame-2016-0006 https://doi.org/10.1515/icame-2016-0006

Newberry, M. G., Ahern, C. A., Clark, R., and Plotkin, J. B. (2017). Detecting evolutionary forces in language change. Nature 551: 223-226. https://doi.org/10.1038/nature24455

Norris, J. and Ortega, L. (2000). Effectiveness of L2 instruction: A research synthesis and quantitative meta-analysis. Language Learning 50 (3): 417-528. https://doi.org/10.1111/0023-8333.00136

Raumolin-Brunberg, H. (1998). Social factors and pronominal change in the seventeenth century: The Civil-War effect? In J. Fisiak and M. Krygier (Eds) Advances in English Historical Linguistics 361-388. Berlin: Mouton de Gruyter.

Renehan, A., Tyson, M., Egger, M., Heller, R., and Zwahlen, M. (2008). Body-mass index and incidence of cancer: A systematic review and meta-analysis of prospective observational studies. Lancet 371(9612): 569-578. https://doi.org/10.1016/S0140-6736(08)60269-X

Reynolds, K., Lewis, B., Nolen, J., Kinney, G., Sathya, B., and He, J. (2003). Alcohol consumption and risk of stroke: a meta-analysis. JAMA 289(5): 579-588. https://doi.org/10.1001/jama.289.5.579

Rissanen, M. (2003). On the development of English adverbial connectives. In M. Ukaji, M. Ike-Uchi and Y. Nishimura (Eds) Current Issues in English Linguistics (Special Publications of the English Linguistic Society of Japan 2) 229-247. Tokyo: Kaitakusha.

Trudgill, P. (2011). Sociolinguistic Typology: Social Determinants of Linguistic Complexity. Oxford: Oxford University Press.

van Assem, M., Menken, M. R., Schreiber, G., Wielemaker, J., and Wielinga, B. (2004). A method for converting thesauri to RDF/OWL. In S. A. McIlraith, D. Plexousakis, and F. van Harmelen (Eds) The Semantic Web - ISWC 2004 17-31. Berlin: Springer. https://doi.org/10.1007/978-3-540-30475-3_3

Wang, A., Wang, S., Zhu, C., Huang, H., Wu, L., Wan, X., Yang, X., Zhang, H., Miao, R., He, L., Sang, X., and Zhao, H. (2016). Coffee and cancer risk: A meta-analysis of prospective observational studies. Scientific Reports 6: 33711. https://doi.org/10.1038/srep33711

Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., Gonzalez-Beltran, A., Gray, A. J. G., Groth, P., Goble, C., Grethe, J. S., Heringa, J., 't Hoen, P. A. C., Hooft, R., Kuhn, T., Kok, R., Kok, J., Lusher, S. J., Martone, M. E., Mons, A., Packer, A. L., Persson, B., Rocca-Serra, P., Roos, M., van Schaik, R., Sansone, S.-A., Schultes, E., Sengstag, T., Slater, T., Strawn, G., Swertz, M. A., Thompson, M., van der Lei, J., van Mulligen, E., Velterop, J., Waagmeester, A., Wittenburg, P., Wolstencroft, K., Zhao, J., and Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data 3: 160018. https://doi.org/10.1038/sdata.2016.18

Wood, D. (2015). Semantic composition of disparate data in GeoHealthUS for navigation, display and analysis. Poster, International Semantic Web Conference (ISWC) 2015. doi:10.13140/RG.2.2.33948.59528






How to Cite

Kesäniemi, J., Vartiainen, T., Säily, T., & Nevalainen, T. (2019). Exploring Meta-analysis for Historical Corpus Linguistics Based on Linked Data. Journal of Research Design and Statistics in Linguistics and Communication Science, 5(1-2), 4-47. https://doi.org/10.1558/jrds.36709