Introducing NeoRate
A corpus-lexicographic tool to support the detection of unregistered words for a German COVID-19 discourse dictionary
DOI:
https://doi.org/10.1558/lexi.26366Keywords:
neologism detection, corpus-lexicographic tool, COVID-19 discourseAbstract
In this article, we provide an insight into the development and application of a corpus-lexicographic tool for finding neologisms that are not yet listed in German dictionaries. As a starting point, we used the words listed in a glossary of German neologisms surrounding the COVID-19 pandemic. These words are lemma candidates for a new dictionary on COVID-19 discourse in German. They also provided the database used to develop and test the NeoRate tool. We report on the lexicographic work in our dictionary project, the design and functionalities of NeoRate, and describe the first test results with the tool, in particular with regard to previously unregistered words. Finally, we discuss further development of the tool and its possible applications.
References
Adelstein, A., and de los Ángeles Boschiroli, V. (2022). Spanish neologisms during the COVID-19 pandemic: Changing criteria for their inclusion and representation in dictionaries. In A. Klosa-Kückelhaus and I. Kernerman (Eds.), Lexicography of coronavirus-related neologisms (pp. 93–124). Berlin and Boston: De Gruyter. https://doi.org/10.1515/9783110798081-006
Afentoulidou, V., and Christofidou, A. (2021). It’s a long way to a dictionary: Towards a corpus-based dictionary of neologisms. In Z. Gavriilidou, L. Mitits, and S. Kiosses (Eds.), Lexicography for inclusion: Proceedings of the 19th EURALEX International Congress (vol. 2, pp. 597–606). September 7–9. Alexandroupolis: Democritus University of Thrace.
Ahmed, Z., Amizadeh, S., Bilenko, M., Carr, R., Chin, W.-S., Dekel, Y., et al. (2019). Machine Learning at Microsoft with ML.NET. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 2448–2458). July 25. Anchorage: ACM. https://doi.org/10.1145/3292500.3330667; https://dl.acm.org/doi/10.1145/3292500.3330667
Alves, I. M., Curti-Contessoto, B., and Costa, L. (2022). COVID-19 terminology and its dissemination to a non-specialised public in Brazil. In A. Klosa-Kückelhaus and I. Kernerman (Eds.), Lexicography of coronavirus-related neologisms (pp. 221–236). Berlin and Boston: De Gruyter. https://doi.org/10.1515/9783110798081-011
Baharati, A. L. (2020). Covid-19 neologisms in English. Saptagandaki Journal, 11, 122–135. https://doi.org/10.3126/sj.v11i0.36901
Barbosa S., and Duarte Martins, S. (2022). The neologisms of the COVID-19 pandemic in European Portuguese: From media to dictionary. In A. Klosa-Kückelhaus and I. Kernerman (Eds.), Lexicography of coronavirus-related neologisms (pp. 191–220). Berlin and Boston: De Gruyter. https://doi.org/10.1515/9783110798081-010
Berlin-Brandenburgische Akademie der Wissenschaften (2003–2023). Digitales Wörterbuch der deutschen Sprache (DWDS). Retrieved May 19, 2023, from https://www.dwds.de
Bubenhofer, N. (2018). Diskurslinguistik und Korpora. In I. Warnke (Ed.), Handbuch Diskurs (pp. 208–241). Berlin and Boston: De Gruyter. https://doi.org/10.1515/9783110296075-009
Bueno, P. (2020). When neologisms don’t reach the dictionary: Occasionalisms in Spanish. In Z. Gavriilidou, M. Mitsiaki, and A. Fliatouras (Eds.), Lexicography for inclusion: Proceedings of the 19th EURALEX International Congress (vol. 1, pp. 333–341). September 7–9. Alexandroupolis: Democritus University of Thrace.
Bueno, P., and Freixa, J. (2022). Lexicographic detection and representation of Spanish neologisms in the COVID-19 pandemic. In A. Klosa-Kückelhaus and I. Kernerman (Eds.), Lexicography of coronavirus-related neologisms (pp. 73–92). Berlin and Boston: De Gruyter. https://doi.org/10.1515/9783110798081-005
Bundessprachenamt (Ed.) (n. d.). Coronavirus glossary in 7 languages. Retrieved May 19, 2023, from https://app.coreon.com/5ea2adb797e1040100eb7ff3; https://termcoord.eu/2020/08/bundessprachenamt-coronavirus-glossary
Cartier, E. (2019). Néoveille, plateforme de repérage et de suivi des néologismes en corpus dynamique. Neologica, 13, 23–54. https://doi.org/10.15122/isbn.978-2-406-09663-4.p.0023
Choi, J., and Jung, H.-Y. (2022). On loans in Korean new word formation and in lexicography. In A. Klosa-Kückelhaus, S. Engelberg, C. Möhrs, and P. Storjohann (Eds.), Dictionaries and society. Proceedings of the XX EURALEX International Congress (pp. 814–824). July 12–16. Mannheim: IDS-Verlag. https://doi.org/10.14618/ids-pub-11339
Coll, M., and Barité, M. (2022). Specialized voices in the 23rd edition of the Diccionario de la lengua española: Analysis of the COVID-19 field and its neologisms. In A. Klosa-Kückelhaus and I. Kernerman (Eds.), Lexicography of Coronavirus-related neologisms (pp. 125–146). Berlin and Boston: De Gruyter. https://doi.org/10.1515/9783110798081-007
Costa, R., Ramos, M., Salgado, A., Carvalho, S., Almeida, B., and Silva, R. (2022). Neoterm or neologism? A closer look at the determinologisation process. In A. Klosa-Kückelhaus and I. Kernerman (Eds.), Lexicography of coronavirus-related neologisms (pp. 237–259). Berlin and Boston: De Gruyter. https://doi.org/10.1515/9783110798081-012
cOWIDplus Viewer (2020–2021). Retrieved May 19, 2023, from https://www.owid.de/plus/cowidplusviewer2020
Dansk Sprognævn (2020–2021). Et coronaramt ordforråd. Retrieved May 19, 2023, from https://dsn.dk/ordboeger/nye-ord-i-dansk/et-Coronaramt-ordforraad-nye-tilfoejelser-til-nye-ord-i-dansk
De Schryver, G.-M. (2020). Linguistics terminology and neologisms in Swahili: Rules vs. practice. Dictionaries, 41(1), 83–104. https://doi.org/10.1353/dic.2020.0006
De Schryver, G.-M., and Nabirye, M. (2022). Towards a monitor corpus for a Bantu language. A case study of neology detection in Lusoga. In A. Klosa-Kückelhaus, S. Engelberg, C. Möhrs, and P. Storjohann (Eds.), Dictionaries and society. Proceedings of the XX EURALEX International Congress (pp. 814–824). Mannheim: IDS-Verlag. https://doi.org/10.14618/ids-pub-11343
Didakowski, J., Lemnitzer, L., and Geyken, A. (2012). Automatic example sentence extraction for a contemporary German dictionary. In R. Fjeld and J. M. Torjusen (Eds.), Proceedings of the 15th EURALEX International Congress (pp. 343–349). Oslo: Department of Linguistics and Scandinavian Studies, University of Oslo.
Duden online (2011–2023). Duden Wörterbuch. Berlin: Cornelsen Verlag GmbH. Retrieved May 19, 2023, from https://www.duden.de/woerterbuch
Fatsis, S. (2021). Thirty-four days: Inside Merriam-Webster’s emergency coronavirus update. Dictionaries, 42(2), 45–56. https://doi.org/10.1353/dic.2021.0018
Freixa, J., and Torner, S. (2020). Beyond frequency: On the dictionarization of new words in Spanish. Dictionaries, 41(1), 131–154. https://doi.org/10.1353/dic.2020.0008
Hinrichs, E., Leinen, P., Geyken, A., Speer, A., and Stein, R. (2022). Text+: Language- and text-based research data infrastructure. Retrieved May 19, 2023, from https://doi.org/10.5281/ZENODO.6452002; https://zenodo.org/record/6452002
Kämper, H. (2015). Diskurslexikografie als gesellschaftsbezogene Wortforschung. Vorstellung eines Wörterbuchkonzepts. In J. Kilian and J. Eckhoff (Eds.), Deutscher Wortschatz – beschreiben, lernen, lehren: Beiträge zur Wortschatzarbeit in Wissenschaft, Sprachunterricht, Gesellschaft (pp. 21–38). Frankfurt am Main: Peter Lang.
Kämper, H. (2018). Diskurslinguistik und Zeitgeschichte. In I. Warnke (Ed.), Handbuch Diskurs (pp. 53–74). Berlin and Boston: De Gruyter. https://doi.org/10.1515/9783110296075-003
Kerremans, D. (2015). A web of new words: A corpus-based study of the conventionalization process of English neologisms. Frankfurt am Main: Peter Lang Edition. https://doi.org/10.3726/978-3-653-04788-2
Kilgarriff, A., Husák, M., McAdam, K., Rundell M., and Rychlý, P. (2008). GDEX: Automatically finding good dictionary examples in a corpus. In E. Bernal and J. DeCesaris (Eds.), Proceedings of the 13th EURALEX International Congress (pp. 425–432). July 15–19. Barcelona: Universitat Pompeu Fabra.
Klekot, N. (2021). Procesos de la creatividad léxica durante la pandemia de COVID-19: Un estudio contrastivo. Roczniki Humanistyczne, 69(6), 101–114. https://doi.org/10.18290/rh21696-6
Klosa-Kückelhaus, A. (2022). German corona-related neologisms and their lexicographic representation. In A. Klosa-Kückelhaus and I. Kernerman (Eds.), Lexicography of coronavirus-related neologisms (pp. 27–42). Berlin and Boston: De Gruyter. https://doi.org/10.1515/9783110798081-003
Klosa, A., and Lüngen, H. (2018). New German words. Detection and description. In J. Cibej, V. Gorjanc, I. Kosem, and S. Krek (Eds.), Proceedings of the XVII EURALEX International Congress: Lexicography in global contexts (pp. 559–569). Llubljana: Ljubljana University Press.
Körner, E., Eckart, T., Herold, A., Wiegand, F., Michaelis, F., Bremm, M., Cotgrove, L., Trippel, T., and Rau, F. (2023). Federated content search for lexical resources (LexFCS):Specification. May 9. https://doi.org/10.5281/ZENODO.7923699; https://zenodo.org/record/7923699
Langemets, M., Kallas, J., Norak, K., and Hein, I. (2020). New Estonian words and senses: Detection and description. Dictionaries, 41(1), 69–82. https://doi.org/10.1353/dic.2020.0005
Leibniz-Institut für Deutsche Sprache (2020). The Mannheim German reference corpus (DeReKo). Mannheim: IDS-Verlag. Retrieved May 19, 2023, from https://www.ids-mannheim.de/en/digspra/corpus-linguistics/projects/corpus-development
Leibniz-Institut für Deutsche Sprache (2020–2023). Neuer Wortschatz rund um die Coronapandemie. In Neologismenwörterbuch. Mannheim: IDS-Verlag. Retrieved May 19, 2023, from https://www.owid.de/docs/neo/listen/corona.jsp
Leibniz-Institut für Deutsche Sprache (2006–2023). Neologismenwörterbuch. In IDS (Ed.), Online Wortschatz-Informationssystem Deutsch (OWID). Mannheim: IDS-Verlag. Retrieved May 19, 2023, from http://www.owid.de/wb/neo/start.html
Lemnitzer, L. (2010). Neologismenlexikographie und das Internet. Lexicographica, 26, 64–78. https://doi.org/10.1515/9783110223231.1.65
Mihaljevic, M., Hudecek, L., and Lewis, K. (2022). Coronavirus-related neologisms. A challenge for Croatian standardology and lexicography. In A. Klosa-Kückelhaus and I. Kernerman (Eds.), Lexicography of coronavirus-related neologisms (pp. 163–190). Berlin and Boston: De Gruyter. https://doi.org/10.1515/9783110798081-009
Müller-Spitzer, C., and Lobin, H. (2022). Leben, lieben, leiden: Geschlechterstereotype in Wörterbüchern, Einfluss der Korpusgrundlage und Abbild der sprachlichen „Wirklichkeit.“ In G. Diewald and D. Nübling (Eds.), Genus – Sexus – Gender (pp. 33–64). Berlin: De Gruyter. https://doi.org/10.1515/9783110746396-002
Müller-Spitzer, C., and Rüdiger, J. O. (2022). The influence of the corpus on the representation of gender stereotypes in the dictionary. A case study of corpus-based dictionaries of German. In A. Klosa-Kückelhaus, S. Engelberg, C. Möhrs, and P. Storjohann (Eds.), Dictionaries and society. Proceedings of the XX EURALEX International Congress (pp. 129–141). Mannheim: IDS-Verlag. https://doi.org/10.14618/ids-pub-11144
Mweri, J. (2021). Corona virus disease (COVID-19) effects on language use: An analysis of neologisms. Linguistics and Literature Studies, 9(1), 36–47. https://doi.org/10.13189/lls.2021.090105
Nam, K., Lee, S., and Jung, H.-Y. (2020). The Korean Neologism Investigation Project: Current status and key issues. Dictionaries, 41(1), 105–129. https://doi.org/10.1353/dic.2020.0007
Nam, K., Jinsan, A., and Jung, H.-Y. (2022). The emergence and spread of Korean COVID-19 neologisms in news articles and user comments and their lexicographic description. In A. Klosa-Kückelhaus and I. Kernerman (Eds.), Lexicography of coronavirus-related neologisms (pp. 43–72). Berlin and Boston: De Gruyter. https://doi.org/10.1515/9783110798081-004
Navarro, F. A. (2020). La covid-19 y el lenguaje medico. Revista Española de Cardiología, 73(19), 790–791. https://doi.org/10.1016/j.recesp.2020.06.021
OWIDplusLIVE (2021–2023). Retrieved May 19, 2023, from https://www.owid.de/plus/live-2021
Papp, J. (2022). How the COVID-19 pandemic is changing the Hungarian language: Building a domain-specific Hungarian/Italian/English dictionary of the COVID-19 pandemic. In A. Klosa-Kückelhaus and I. Kernerman (Eds.), Lexicography of coronavirus-related neologisms (pp. 147–163). Berlin and Boston: De Gruyter. https://doi.org/10.1515/9783110798081-008
Rüdiger, J. O. (2018). CEHugeWebCorpus. Retrieved May 19, 2023, from http://hdl.handle.net/11372/LRT-2638
Rüdiger, J. O. (2020). SSMDL – SiegenerSocialMediaDataLake. Siegen: University of Siegen. Retrieved May 19, 2023, from https://diskurslinguistik.net/forschung/software/ssmdl
Rüdiger, J. O. (2021). CorpusExplorer. Kassel: University of Kassel. Retrieved May 19, 2023, from http://corpusexplorer.de
Rüdiger, J. O., Wolfer, S., Koplenig, A., Michaelis, F., Müller-Spitzer, C., Ochs, S., and Cotgrove, L. (2022). OWIDplusLIVE: Day-to-day collection, exploration, analysis, and visualization of N-Gram frequencies in German (online press) language. In A. Klosa-Kückelhaus, S. Engelberg, C. Möhrs, and P. Storjohann (Eds.), Dictionaries and society. Book of abstracts of the XX EURALEX International Congress (pp. 82–84). July 12–16. Mannheim: IDS-Verlag. https://doi.org/10.14618/t86d-e088
Sajous, F. (2022). Using Wiktionary revision history to uncover lexical innovations to topical events: Application to Covid-19 neologisms. In A. Klosa-Kückelhaus and I. Kernerman (Eds.), Lexicography of coronavirus-related neologisms (pp. 275–306). Berlin and Boston: De Gruyter. https://doi.org/10.1515/9783110798081-014
Saladrigas, M. V., Munoa, L., Navarro, F. A., and Gómez Polledo, P. (2020). Glosario de covid-19 (EN-ES) abreviado. Panace@, 21(51), 110–172.
Salazar, D., and Wild, K. (2022). The Oxford English Dictionary and the language of Covid-19. In A. Klosa-Kückelhaus and I. Kernerman (Eds.), Lexicography of coronavirus-related neologisms (pp. 11–26). Berlin and Boston: De Gruyter. https://doi.org/10.1515/9783110798081-002
Sijens, H., and Van de Velde, H. (2020). The formation of neologisms in a lesser-used language: The case of Frisian. Dictionaries, 41(1), 45–67. https://doi.org/10.1353/dic.2020.0004
Taalbank (2020). Coronawoordenboek [web blog]. Retrieved May 19, 2023, from https://www.taalbank.nl/2020/03/14/Coronawoordenboek
Teubert, W. (1998). Korpus und Neologie. In W. Teubert (Ed.), Neologie und Korpus (pp. 129–170). Tübingen: Gunter Narr Verlag.
Trap-Jensen, L. (2020). Language-internal neologisms and anglicisms: Dealing with new words and expressions in The Danish Dictionary. Dictionaries, 41(1), 11–26. https://doi.org/10.1353/dic.2020.0002
Trap-Jensen, L., and Lorentzen, H. (2022). Recent neologisms provoked by COVID-19 – in the Danish language and in The Danish Dictionary. In A. Klosa-Kückelhaus, S. Engelberg, C. Möhrs, and P. Storjohann (Eds.), Dictionaries and society. Proceedings of the XX EURALEX International Congress (pp. 825–832). July 12–16. Mannheim: IDS-Verlag. https://doi.org/10.14618/ids-pub-11341
Vale, M., and McKee, R. (2022). Neologisms in the New Zealand Sign Language: A case study of COVID-19 pandemic-related signs. In A. Klosa-Kückelhaus and I. Kernerman (Eds.), Lexicography of coronavirus-related neologisms (pp. 261–274). Berlin and Boston: De Gruyter. https://doi.org/10.1515/9783110798081-013
Waszink, V. (2020). Neologisms in an online portal: The Dutch Neologismenwoordenboek (NW). Dictionaries, 41(1), 27–44. https://doi.org/10.1353/dic.2020.0003
Wolfer, S., Koplenig, A., Michaelis, F., and Müller-Spitzer, C. (2020). cOWIDplus Viewer. Retrieved May 19, 2023, from https://www.owid.de/plus/cowidplusviewer2020
Zholobova, A. (2021). Linguistic innovation during the COVID-19 pandemic: The Spanish language case. XLinguae, 14(2), 331–349. https://doi.org/10.18355/XL.2021.14.02.24
Zimmer, B. (2021). How global events and social change affect modern lexicography. Dictionaries, 42(2), 41–44. https://doi.org/10.1353/dic.2021.0017