Introducing NeoRate

A corpus-lexicographic tool to support the detection of unregistered words for a German COVID-19 discourse dictionary

Authors

  • Annette Klosa-Kückelhaus Leibniz-Institut für Deutsche Sprache (IDS)
  • Jan Oliver Rüdiger Leibniz-Institut für Deutsche Sprache (IDS)

DOI:

https://doi.org/10.1558/lexi.26366

Keywords:

neologism detection, corpus-lexicographic tool, COVID-19 discourse

Abstract

In this article, we provide an insight into the development and application of a corpus-lexicographic tool for finding neologisms that are not yet listed in German dictionaries. As a starting point, we used the words listed in a glossary of German neologisms surrounding the COVID-19 pandemic. These words are lemma candidates for a new dictionary on COVID-19 discourse in German. They also provided the database used to develop and test the NeoRate tool. We report on the lexicographic work in our dictionary project, the design and functionalities of NeoRate, and describe the first test results with the tool, in particular with regard to previously unregistered words. Finally, we discuss further development of the tool and its possible applications.

Author Biographies

  • Annette Klosa-Kückelhaus, Leibniz-Institut für Deutsche Sprache (IDS)

    Annette Klosa-Kückelhaus heads the Lexicography and Language Documentation program area in the Department of Lexicology at the Leibniz Institute for the German Language (IDS) in Mannheim. She has acted as editor-in-large for “elexiko – online dictionary of contemporary German” as well as the neologism dictionary at IDS.

  • Jan Oliver Rüdiger, Leibniz-Institut für Deutsche Sprache (IDS)

    Jan Oliver Rüdiger is a post-doctoral researcher in the Department for Lexical Studies at the Leibniz Institute for the German Language (IDS) in Mannheim. He is interested in text- and data-mining, digital humanities and natural language processing. He develops various research software for the mentioned areas.

References

Adelstein, A., and de los Ángeles Boschiroli, V. (2022). Spanish neologisms during the COVID-19 pandemic: Changing criteria for their inclusion and representation in dictionaries. In A. Klosa-Kückelhaus and I. Kernerman (Eds.), Lexicography of coronavirus-related neologisms (pp. 93–124). Berlin and Boston: De Gruyter. https://doi.org/10.1515/9783110798081-006

Afentoulidou, V., and Christofidou, A. (2021). It’s a long way to a dictionary: Towards a corpus-based dictionary of neologisms. In Z. Gavriilidou, L. Mitits, and S. Kiosses (Eds.), Lexicography for inclusion: Proceedings of the 19th EURALEX International Congress (vol. 2, pp. 597–606). September 7–9. Alexandroupolis: Democritus University of Thrace.

Ahmed, Z., Amizadeh, S., Bilenko, M., Carr, R., Chin, W.-S., Dekel, Y., et al. (2019). Machine Learning at Microsoft with ML.NET. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 2448–2458). July 25. Anchorage: ACM. https://doi.org/10.1145/3292500.3330667; https://dl.acm.org/doi/10.1145/3292500.3330667

Alves, I. M., Curti-Contessoto, B., and Costa, L. (2022). COVID-19 terminology and its dissemination to a non-specialised public in Brazil. In A. Klosa-Kückelhaus and I. Kernerman (Eds.), Lexicography of coronavirus-related neologisms (pp. 221–236). Berlin and Boston: De Gruyter. https://doi.org/10.1515/9783110798081-011

Baharati, A. L. (2020). Covid-19 neologisms in English. Saptagandaki Journal, 11, 122–135. https://doi.org/10.3126/sj.v11i0.36901

Barbosa S., and Duarte Martins, S. (2022). The neologisms of the COVID-19 pandemic in European Portuguese: From media to dictionary. In A. Klosa-Kückelhaus and I. Kernerman (Eds.), Lexicography of coronavirus-related neologisms (pp. 191–220). Berlin and Boston: De Gruyter. https://doi.org/10.1515/9783110798081-010

Berlin-Brandenburgische Akademie der Wissenschaften (2003–2023). Digitales Wörterbuch der deutschen Sprache (DWDS). Retrieved May 19, 2023, from https://www.dwds.de

Bubenhofer, N. (2018). Diskurslinguistik und Korpora. In I. Warnke (Ed.), Handbuch Diskurs (pp. 208–241). Berlin and Boston: De Gruyter. https://doi.org/10.1515/9783110296075-009

Bueno, P. (2020). When neologisms don’t reach the dictionary: Occasionalisms in Spanish. In Z. Gavriilidou, M. Mitsiaki, and A. Fliatouras (Eds.), Lexicography for inclusion: Proceedings of the 19th EURALEX International Congress (vol. 1, pp. 333–341). September 7–9. Alexandroupolis: Democritus University of Thrace.

Bueno, P., and Freixa, J. (2022). Lexicographic detection and representation of Spanish neologisms in the COVID-19 pandemic. In A. Klosa-Kückelhaus and I. Kernerman (Eds.), Lexicography of coronavirus-related neologisms (pp. 73–92). Berlin and Boston: De Gruyter. https://doi.org/10.1515/9783110798081-005

Bundessprachenamt (Ed.) (n. d.). Coronavirus glossary in 7 languages. Retrieved May 19, 2023, from https://app.coreon.com/5ea2adb797e1040100eb7ff3; https://termcoord.eu/2020/08/bundessprachenamt-coronavirus-glossary

Cartier, E. (2019). Néoveille, plateforme de repérage et de suivi des néologismes en corpus dynamique. Neologica, 13, 23–54. https://doi.org/10.15122/isbn.978-2-406-09663-4.p.0023

Choi, J., and Jung, H.-Y. (2022). On loans in Korean new word formation and in lexicography. In A. Klosa-Kückelhaus, S. Engelberg, C. Möhrs, and P. Storjohann (Eds.), Dictionaries and society. Proceedings of the XX EURALEX International Congress (pp. 814–824). July 12–16. Mannheim: IDS-Verlag. https://doi.org/10.14618/ids-pub-11339

Coll, M., and Barité, M. (2022). Specialized voices in the 23rd edition of the Diccionario de la lengua española: Analysis of the COVID-19 field and its neologisms. In A. Klosa-Kückelhaus and I. Kernerman (Eds.), Lexicography of Coronavirus-related neologisms (pp. 125–146). Berlin and Boston: De Gruyter. https://doi.org/10.1515/9783110798081-007

Costa, R., Ramos, M., Salgado, A., Carvalho, S., Almeida, B., and Silva, R. (2022). Neoterm or neologism? A closer look at the determinologisation process. In A. Klosa-Kückelhaus and I. Kernerman (Eds.), Lexicography of coronavirus-related neologisms (pp. 237–259). Berlin and Boston: De Gruyter. https://doi.org/10.1515/9783110798081-012

cOWIDplus Viewer (2020–2021). Retrieved May 19, 2023, from https://www.owid.de/plus/cowidplusviewer2020

Dansk Sprognævn (2020–2021). Et coronaramt ordforråd. Retrieved May 19, 2023, from https://dsn.dk/ordboeger/nye-ord-i-dansk/et-Coronaramt-ordforraad-nye-tilfoejelser-til-nye-ord-i-dansk

De Schryver, G.-M. (2020). Linguistics terminology and neologisms in Swahili: Rules vs. practice. Dictionaries, 41(1), 83–104. https://doi.org/10.1353/dic.2020.0006

De Schryver, G.-M., and Nabirye, M. (2022). Towards a monitor corpus for a Bantu language. A case study of neology detection in Lusoga. In A. Klosa-Kückelhaus, S. Engelberg, C. Möhrs, and P. Storjohann (Eds.), Dictionaries and society. Proceedings of the XX EURALEX International Congress (pp. 814–824). Mannheim: IDS-Verlag. https://doi.org/10.14618/ids-pub-11343

Didakowski, J., Lemnitzer, L., and Geyken, A. (2012). Automatic example sentence extraction for a contemporary German dictionary. In R. Fjeld and J. M. Torjusen (Eds.), Proceedings of the 15th EURALEX International Congress (pp. 343–349). Oslo: Department of Linguistics and Scandinavian Studies, University of Oslo.

Duden online (2011–2023). Duden Wörterbuch. Berlin: Cornelsen Verlag GmbH. Retrieved May 19, 2023, from https://www.duden.de/woerterbuch

Fatsis, S. (2021). Thirty-four days: Inside Merriam-Webster’s emergency coronavirus update. Dictionaries, 42(2), 45–56. https://doi.org/10.1353/dic.2021.0018

Freixa, J., and Torner, S. (2020). Beyond frequency: On the dictionarization of new words in Spanish. Dictionaries, 41(1), 131–154. https://doi.org/10.1353/dic.2020.0008

Hinrichs, E., Leinen, P., Geyken, A., Speer, A., and Stein, R. (2022). Text+: Language- and text-based research data infrastructure. Retrieved May 19, 2023, from https://doi.org/10.5281/ZENODO.6452002; https://zenodo.org/record/6452002

Kämper, H. (2015). Diskurslexikografie als gesellschaftsbezogene Wortforschung. Vorstellung eines Wörterbuchkonzepts. In J. Kilian and J. Eckhoff (Eds.), Deutscher Wortschatz – beschreiben, lernen, lehren: Beiträge zur Wortschatzarbeit in Wissenschaft, Sprachunterricht, Gesellschaft (pp. 21–38). Frankfurt am Main: Peter Lang.

Kämper, H. (2018). Diskurslinguistik und Zeitgeschichte. In I. Warnke (Ed.), Handbuch Diskurs (pp. 53–74). Berlin and Boston: De Gruyter. https://doi.org/10.1515/9783110296075-003

Kerremans, D. (2015). A web of new words: A corpus-based study of the conventionalization process of English neologisms. Frankfurt am Main: Peter Lang Edition. https://doi.org/10.3726/978-3-653-04788-2

Kilgarriff, A., Husák, M., McAdam, K., Rundell M., and Rychlý, P. (2008). GDEX: Automatically finding good dictionary examples in a corpus. In E. Bernal and J. DeCesaris (Eds.), Proceedings of the 13th EURALEX International Congress (pp. 425–432). July 15–19. Barcelona: Universitat Pompeu Fabra.

Klekot, N. (2021). Procesos de la creatividad léxica durante la pandemia de COVID-19: Un estudio contrastivo. Roczniki Humanistyczne, 69(6), 101–114. https://doi.org/10.18290/rh21696-6

Klosa-Kückelhaus, A. (2022). German corona-related neologisms and their lexicographic representation. In A. Klosa-Kückelhaus and I. Kernerman (Eds.), Lexicography of coronavirus-related neologisms (pp. 27–42). Berlin and Boston: De Gruyter. https://doi.org/10.1515/9783110798081-003

Klosa, A., and Lüngen, H. (2018). New German words. Detection and description. In J. Cibej, V. Gorjanc, I. Kosem, and S. Krek (Eds.), Proceedings of the XVII EURALEX International Congress: Lexicography in global contexts (pp. 559–569). Llubljana: Ljubljana University Press.

Körner, E., Eckart, T., Herold, A., Wiegand, F., Michaelis, F., Bremm, M., Cotgrove, L., Trippel, T., and Rau, F. (2023). Federated content search for lexical resources (LexFCS):Specification. May 9. https://doi.org/10.5281/ZENODO.7923699; https://zenodo.org/record/7923699

Langemets, M., Kallas, J., Norak, K., and Hein, I. (2020). New Estonian words and senses: Detection and description. Dictionaries, 41(1), 69–82. https://doi.org/10.1353/dic.2020.0005

Leibniz-Institut für Deutsche Sprache (2020). The Mannheim German reference corpus (DeReKo). Mannheim: IDS-Verlag. Retrieved May 19, 2023, from https://www.ids-mannheim.de/en/digspra/corpus-linguistics/projects/corpus-development

Leibniz-Institut für Deutsche Sprache (2020–2023). Neuer Wortschatz rund um die Coronapandemie. In Neologismenwörterbuch. Mannheim: IDS-Verlag. Retrieved May 19, 2023, from https://www.owid.de/docs/neo/listen/corona.jsp

Leibniz-Institut für Deutsche Sprache (2006–2023). Neologismenwörterbuch. In IDS (Ed.), Online Wortschatz-Informationssystem Deutsch (OWID). Mannheim: IDS-Verlag. Retrieved May 19, 2023, from http://www.owid.de/wb/neo/start.html

Lemnitzer, L. (2010). Neologismenlexikographie und das Internet. Lexicographica, 26, 64–78. https://doi.org/10.1515/9783110223231.1.65

Mihaljevic, M., Hudecek, L., and Lewis, K. (2022). Coronavirus-related neologisms. A challenge for Croatian standardology and lexicography. In A. Klosa-Kückelhaus and I. Kernerman (Eds.), Lexicography of coronavirus-related neologisms (pp. 163–190). Berlin and Boston: De Gruyter. https://doi.org/10.1515/9783110798081-009

Müller-Spitzer, C., and Lobin, H. (2022). Leben, lieben, leiden: Geschlechterstereotype in Wörterbüchern, Einfluss der Korpusgrundlage und Abbild der sprachlichen „Wirklichkeit.“ In G. Diewald and D. Nübling (Eds.), Genus – Sexus – Gender (pp. 33–64). Berlin: De Gruyter. https://doi.org/10.1515/9783110746396-002

Müller-Spitzer, C., and Rüdiger, J. O. (2022). The influence of the corpus on the representation of gender stereotypes in the dictionary. A case study of corpus-based dictionaries of German. In A. Klosa-Kückelhaus, S. Engelberg, C. Möhrs, and P. Storjohann (Eds.), Dictionaries and society. Proceedings of the XX EURALEX International Congress (pp. 129–141). Mannheim: IDS-Verlag. https://doi.org/10.14618/ids-pub-11144

Mweri, J. (2021). Corona virus disease (COVID-19) effects on language use: An analysis of neologisms. Linguistics and Literature Studies, 9(1), 36–47. https://doi.org/10.13189/lls.2021.090105

Nam, K., Lee, S., and Jung, H.-Y. (2020). The Korean Neologism Investigation Project: Current status and key issues. Dictionaries, 41(1), 105–129. https://doi.org/10.1353/dic.2020.0007

Nam, K., Jinsan, A., and Jung, H.-Y. (2022). The emergence and spread of Korean COVID-19 neologisms in news articles and user comments and their lexicographic description. In A. Klosa-Kückelhaus and I. Kernerman (Eds.), Lexicography of coronavirus-related neologisms (pp. 43–72). Berlin and Boston: De Gruyter. https://doi.org/10.1515/9783110798081-004

Navarro, F. A. (2020). La covid-19 y el lenguaje medico. Revista Española de Cardiología, 73(19), 790–791. https://doi.org/10.1016/j.recesp.2020.06.021

OWIDplusLIVE (2021–2023). Retrieved May 19, 2023, from https://www.owid.de/plus/live-2021

Papp, J. (2022). How the COVID-19 pandemic is changing the Hungarian language: Building a domain-specific Hungarian/Italian/English dictionary of the COVID-19 pandemic. In A. Klosa-Kückelhaus and I. Kernerman (Eds.), Lexicography of coronavirus-related neologisms (pp. 147–163). Berlin and Boston: De Gruyter. https://doi.org/10.1515/9783110798081-008

Rüdiger, J. O. (2018). CEHugeWebCorpus. Retrieved May 19, 2023, from http://hdl.handle.net/11372/LRT-2638

Rüdiger, J. O. (2020). SSMDL – SiegenerSocialMediaDataLake. Siegen: University of Siegen. Retrieved May 19, 2023, from https://diskurslinguistik.net/forschung/software/ssmdl

Rüdiger, J. O. (2021). CorpusExplorer. Kassel: University of Kassel. Retrieved May 19, 2023, from http://corpusexplorer.de

Rüdiger, J. O., Wolfer, S., Koplenig, A., Michaelis, F., Müller-Spitzer, C., Ochs, S., and Cotgrove, L. (2022). OWIDplusLIVE: Day-to-day collection, exploration, analysis, and visualization of N-Gram frequencies in German (online press) language. In A. Klosa-Kückelhaus, S. Engelberg, C. Möhrs, and P. Storjohann (Eds.), Dictionaries and society. Book of abstracts of the XX EURALEX International Congress (pp. 82–84). July 12–16. Mannheim: IDS-Verlag. https://doi.org/10.14618/t86d-e088

Sajous, F. (2022). Using Wiktionary revision history to uncover lexical innovations to topical events: Application to Covid-19 neologisms. In A. Klosa-Kückelhaus and I. Kernerman (Eds.), Lexicography of coronavirus-related neologisms (pp. 275–306). Berlin and Boston: De Gruyter. https://doi.org/10.1515/9783110798081-014

Saladrigas, M. V., Munoa, L., Navarro, F. A., and Gómez Polledo, P. (2020). Glosario de covid-19 (EN-ES) abreviado. Panace@, 21(51), 110–172.

Salazar, D., and Wild, K. (2022). The Oxford English Dictionary and the language of Covid-19. In A. Klosa-Kückelhaus and I. Kernerman (Eds.), Lexicography of coronavirus-related neologisms (pp. 11–26). Berlin and Boston: De Gruyter. https://doi.org/10.1515/9783110798081-002

Sijens, H., and Van de Velde, H. (2020). The formation of neologisms in a lesser-used language: The case of Frisian. Dictionaries, 41(1), 45–67. https://doi.org/10.1353/dic.2020.0004

Taalbank (2020). Coronawoordenboek [web blog]. Retrieved May 19, 2023, from https://www.taalbank.nl/2020/03/14/Coronawoordenboek

Teubert, W. (1998). Korpus und Neologie. In W. Teubert (Ed.), Neologie und Korpus (pp. 129–170). Tübingen: Gunter Narr Verlag.

Trap-Jensen, L. (2020). Language-internal neologisms and anglicisms: Dealing with new words and expressions in The Danish Dictionary. Dictionaries, 41(1), 11–26. https://doi.org/10.1353/dic.2020.0002

Trap-Jensen, L., and Lorentzen, H. (2022). Recent neologisms provoked by COVID-19 – in the Danish language and in The Danish Dictionary. In A. Klosa-Kückelhaus, S. Engelberg, C. Möhrs, and P. Storjohann (Eds.), Dictionaries and society. Proceedings of the XX EURALEX International Congress (pp. 825–832). July 12–16. Mannheim: IDS-Verlag. https://doi.org/10.14618/ids-pub-11341

Vale, M., and McKee, R. (2022). Neologisms in the New Zealand Sign Language: A case study of COVID-19 pandemic-related signs. In A. Klosa-Kückelhaus and I. Kernerman (Eds.), Lexicography of coronavirus-related neologisms (pp. 261–274). Berlin and Boston: De Gruyter. https://doi.org/10.1515/9783110798081-013

Waszink, V. (2020). Neologisms in an online portal: The Dutch Neologismenwoordenboek (NW). Dictionaries, 41(1), 27–44. https://doi.org/10.1353/dic.2020.0003

Wolfer, S., Koplenig, A., Michaelis, F., and Müller-Spitzer, C. (2020). cOWIDplus Viewer. Retrieved May 19, 2023, from https://www.owid.de/plus/cowidplusviewer2020

Zholobova, A. (2021). Linguistic innovation during the COVID-19 pandemic: The Spanish language case. XLinguae, 14(2), 331–349. https://doi.org/10.18355/XL.2021.14.02.24

Zimmer, B. (2021). How global events and social change affect modern lexicography. Dictionaries, 42(2), 41–44. https://doi.org/10.1353/dic.2021.0017

Published

2023-11-15

How to Cite

Klosa-Kückelhaus, A., & Rüdiger, J. O. (2023). Introducing NeoRate: A corpus-lexicographic tool to support the detection of unregistered words for a German COVID-19 discourse dictionary. Lexicography, 10(2), 117-137. https://doi.org/10.1558/lexi.26366