Overcoming issues in frequency-based extraction and lexicographic lnclusion of Korean neologisms
A triangulation approach
Keywords:Neologism, Phrasal neologism, Semantic neologism, Web corpus, Triangulation approach
This paper discusses issues regarding frequency as a criterion for Korean neologism extraction from the perspective of corpus linguistics and lexicography. Most studies agree that frequency plays a central role in the inclusion of neologisms in the dictionary; however, frequency entails a number of complex factors such as the time span of a word’s use as well as the variety of registers. The use of web data to extract neologisms – instead of a balanced corpus – has brought about a new range of issues that call for new ways to address them. Section 2 reviews previous research trends related to neologism frequency from the point of view of linguistics and neologism studies. Section 3 examines and discusses issues in the detection of phrasal and semantic neologisms, and in the use of Web corpora. Section 4 suggests the use of triangulation in order to cope with such shortcomings, combining use-based methodology and used-based approach.
Baker, P., & Egbert, J. (Eds.) (2016). Triangulating methodological approaches in corpus linguistic research. New York and London: Routledge. https://doi.org/10.4324/9781315724812
Barlow, M. (2011). Corpus linguistics and theoretical linguistics. International Journal of Corpus Linguistics, 16(1), 3–44. https://doi.org/10.1075/ijcl.16.1.02bar
Barnhart, D. K. (2007). A calculus for new words. Dictionaries: Journal of the Dictionary Society of North America, 28(1), 132–138. https://doi.org/10.1353/dic.2007.0009
Biber, D. E., Egbert, J., & Davies, M. (2015). Exploring the composition of the searchable web: A corpus-based taxonomy of web registers, Corpora, 10(1), 11–45. https://doi.org/10.3366/cor.2015.0065
Boleda, G. (2020). Distributional semantics and linguistic theory. Annual Review of Linguistics, 6, 213–234. https://doi.org/10.1146/annurev-linguistics-011619-030303
Boussidan, A. (2013). Dynamics of semantic change: Detecting, analysing and modeling semantic change in corpus in short diachrony. Doctoral dissertation, Université de Lyon.
Bybee, J. (2007). Frequency of Use and the Organization of Language. Oxford: Oxford University Press.
Chomsky, N. (1964). A transformational approach to syntax. In J. A. Fodor & J. J. Katz (Eds.), The Structure of Language (pp. 211–245). [Originally published in A. A. Hill (Ed.) (1962) The Third Texas Conference on problems of Linguistic Analysis in English, 124(58), 124–158.]
Cook, C. P. (2010). Exploiting linguistic knowledge to infer properties of neologisms. Doctoral dissertation, Toronto, Canada: University of Toronto.
Egbert, J., & Baker, P. (Eds.). (2020). Using corpus methods to triangulate linguistic analysis. New York and London: Routledge.
Firth, J. R. (1957). A synopsis of linguistic theory 1930–55. In Studies in linguistic analysis (pp. 1–32). Oxford: The Philological Society.
Freixa, J., & Torner, S. (2019). Beyond Frequency: On the dictionarization of new words in Spanish. In Globalex Workshop on Lexicography and Neologism, the 22nd Biennial Meeting of the Dictionary Society of North America, Bloomington, Indiana. https://doi.org/10.1353/dic.2020.0008
Gatto, M. (2014). Web as corpus: Theory and practice. New York: Bloomsbury Academic.
Guilbert, L. (1975). La créativité lexicale (Langue et langage). Paris: Larousse.
Hanks, P. (2012). The corpus revolution in lexicography, International Journal of Lexicography, 25(4), 398–436. https://doi.org/10.1093/ijl/ecs026
Hamilton, W. L., Leskovec, J., & Jurafsky, D. (2016). Diachronic word embeddings reveal statistical laws of semantic change. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 1489–1501. https://doi.org/10.18653/v1/p16-1141
Hsieh, S.-k. (2015). The secret of long-living words: Predicting the lexical age of neologism with big data. In Proceedings of the 9th Asialex International Congress, Hong Kong.
Kilgarriff, A., & Grefenstette, G. (2003). Introduction to the special issue on the web as corpus. Computational Linguistics, 29(3), 333–347. https://doi.org/10.1162/089120103322711569
Kilgarriff, A., Herman, O., Bušta, J., Kovar, V., Baisa, V., & Jakubí?ek, M. (2015). DIACRAN: A framework for diachronic analysis. eLex 2015, Herstmonceux Castle, UK.
Kim, C., Chae, K., Cheon, H., & Ko, H. (2015). 2015nyen kukminuy eneuysikcosa, The National Institute of Korean Language Reports.
Kim, Y., Chiu, Y. I., Hanaki, K., Hegde, D., & Petrov, S. (2014, June). Temporal Analysis of Language through Neural Language Models. In Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science.61–65. https://doi.org/10.3115/v1/w14-2517
Kosem, I., Gantar, P., & Krek, S. (2013). Automation of lexicographic work: An opportunity for both lexicographers and crowd-sourcing. In Electronic lexicography in the 21st century: Thinking outside the paper. Proceedings of the eLex 2013 conference. Tallinn, Estonia.
Leech, G. (2011). Frequency, corpora and language learning In Meunier, F., De Cock, S., Gilquin, G., & Paquot, M. (Eds.), A taste for corpora (pp. 7–32). Amsterdam: John Benjamins Publishing.
Metcalf, A. A. (2002). Predicting new words: The secrets of their success. Boston: Houghton Mifflin Harcourt.
Nam, K. (2015). An analysis on the usage change of neologisms and the criteria for headwords in the dictionary – focused on 2005?2006 neologisms. Hangeul 310, The Korean Language Society, 205–233. https://doi.org/10.22557/hg.2015.12.310.205
Nam, K., Lee, S. & Jung, H. (2020). The Korean Neologism Investigation Project: Current status and key issues. Dictionaries: Journal of the Dictionary Society of North America, 41(1), 105–129. https://doi.org/10.1353/dic.2020.0007
O’Donovan, R., & O’Neil, M. (2008). A systematic approach to the selection of neologisms for inclusion in a large monolingual dictionary. In Proceedings of the 13th Euralex International Congress, 571–579. Barcelona, Spain.
Renouf, A. (2007). Tracing lexical productivity and creativity in the British media: ‘The chavs and the chav-nots’. In J. Munat (Ed.), Lexical creativity, text and context, (pp. 61–89). Amsterdam: John Benjamins. https://doi.org/10.1075/sfsl.58.12ren
Rundell, M. (2008). The corpus revolution revisited. English Today, 24(1), 23–27. https://doi.org/10.1017/s0266078408000060
Rundell, M. (2015). Crowd-sourcing, wikis, and user-generated content, and their potential value for dictionaries. In Hanks, P., & De Schryver, G. M. (Eds.) International handbook of modern lexis and lexicography (pp. 1–16). Berlin: Springer. https://doi.org/10.1007/978-3-642-45369-4_26-1
Sagi, E., Kaufmann, S., & Clark, B. (2009). Semantic density analysis: Comparing word meaning across time and phonetic space. In Proceedings of the Workshop on Geometrical Models of Natural Language Semantics, 104–111. https://doi.org/10.3115/1705415.1705429
Sheidlower, J. (1995). Principles for the inclusion of new words in college dictionaries. New York :Random House.
Sinclair, J. (1991). Corpus, concordance, collocation. Oxford: Oxford University Press.
Torres-Rivera, A., & Torres-Moreno, J. M. (2020). Detecting new word meanings: A comparison of word embedding models in Spanish. arXiv preprint arXiv:2001.05285.
Turney, P. D., & Pantel, P. (2010). From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 37, 141–188. https://doi.org/10.1613/jair.2934
Zipf, G. K. (1935). The psycho-biology of language. Cambridge. Houghton Mifflin.
Collins English Dictionary, www.collinsdictionary.com/dictionary/english
Merriam-Webster online, http://www.merriam-webster.com
Macmillan Dictionary, http://www.macmillandictionary.com
Macmillan Open Dictionary, http://www.macmillandictionary.com/open-dictionary/
Naver Open Dictionary, https://ko.dict.naver.com/#/main
Oxford Dictionaries Online: English, www.oxforddictionaries.com
Standard Korean Language Dictionary, http://stdWeb2.korean.go.kr/main.jsp
Urban Dictionary, http://www.urbandictionary.com