Authorship attribution, idiolectal style, and online identity
A specialised corpus of Najdi Arabic Tweets
DOI:
https://doi.org/10.1558/ijsll.27343Keywords:
authorship analysis, forensic linguistics, corpus linguistics, online identity, computer-mediated discourse analysis (CMDA), machine learningAbstract
Assistant Professor
Department of English Language
College of Language Sciences
King Saud University
Riyadh 11495 Kingdom of Saudi Arabia
Awarding Institution: University of Leeds, UK
Date of Award: 14 October 2022
References
Ainsworth, J. and Juola, P. (2019) Who wrote this: modern forensic authorship analysis as a model for valid forensic science. Washington University Law Review 96: 1161–1189.
Bucholtz, M. and Hall, K. (2005) Identity and interaction: a sociocultural linguistic approach. Discourse Studies 7: 585–614. https://doi.org/10.1177/1461445605054407
Coulthard, M. (2004) Author identification, idiolect, and linguistic uniqueness. Applied Linguistics 25: 431–447. https://doi.org/10.1093/applin/25.4.431
Coulthard, M. and Johnson, A. (2007) An Introduction to Forensic Linguistics: Language in Evidence. London: Routledge.
Grant, T. (2007) Quantifying evidence in forensic authorship analysis. International Journal of Speech, Language and the Law 14(1): 1–25. https://doi.org/10.1558/ijsll.v14i1.1
Grant, T. (2013) TXT 4N6: Method, consistency, and distinctiveness in the analysis of sms text messages. Journal of Law and Policy 21(2): 467–494.
Grant, T. and Baker, K. (2001) Identifying reliable, valid markers of authorship: a response to Chaski. International Journal of Speech, Language and the Law 8(1): 66–79. https://doi.org/10.1558/sll.2001.8.1.66
Grant, T. and Macleod, N. (2020) Language and Online Identities: The Undercover Policing of Internet Sexual Crime. Cambridge: Cambridge University Press.
Herring, S. C. (2007) A faceted classification scheme for computer-mediated discourse. Language@Internet 4, article 1.
Heydon, G. (2019) Researching Forensic Linguistics: Approaches and Applications. London: Routledge.
Ishihara, S. (2017) Strength of forensic text comparison evidence from stylometric features: a multivariate likelihood ratio-based analysis. International Journal of Speech, Language and the Law 24(1): 67–98. https://doi.org/10.1558/ijsll.30305
Johnson, A. and Wright, D. (2014) Identifying idiolect in forensic authorship attribution: an n-gram textbite approach. Language and Law/Linguagem e Direito 1: 37–69.
Juola, P. (2008) Authorship attribution. Foundations and Trends in Information Retrieval 1(3): 233–334. https://doi.org/10.1561/1500000005
Koppel, M., Schler, J. and Argamon, S. (2009) Computational methods in authorship attribution. Journal of the American Society for Information Science and Technology 60: 9–26. https://doi.org/10.1002/asi.20961
Koppel, M., Schler, J. and Argamon, S. (2011) Authorship attribution in the wild. Language Resources and Evaluation 45: 83–94. https://doi.org/10.1007/s10579-009-9111-2
Koppel, M., Schler, J., Argamon, S. and Winter, Y. (2012) The ‘fundamental problem’ of authorship attribution. English Studies 93: 284–291. https://doi.org/10.1080/0013838X.2012.668794
Larner, S. (2014) Forensic Authorship Analysis and the World Wide Web. Basingstoke: Palgrave.
Mansour, M. A. (2013) The absence of Arabic corpus linguistics: a call for creating an Arabic national corpus. International Journal of Humanities and Social Science 3(12): 81–90.
McMenamin, G. R. (2002) Forensic Linguistics: Advances in Forensic Stylistics. Boca Raton, FL: CRC Press.
Rocha, A., Scheirer, W. J., Forstall, C. W., Cavalcante, T., Theophilo, A., Shen, B., Caravalho, A. R. and Stamatatos, E. (2016) Authorship attribution for social media forensics. IEEE Transactions on Information Forensics and Security 12(1): 5–33. https://doi.org/10.1109/TIFS.2016.2603960
Turell, M. T. (2010) The use of textual, grammatical and sociolinguistic evidence in forensic text comparison. International Journal of Speech, Language and the Law 17(2): 211–250. https://doi.org/10.1558/ijsll.v17i2.211
Turell, M. T. and Gavalda, N. (2012) Towards an index of idiolectal similitude (or distance) in forensic authorship analysis. Journal of Law and Policy 21(2): 495–514.
Witten, I. H., Frank, E., Hall, M. and Pal, C. J. (2016) Data Mining: Practical Machine Learning Tools and Techniques. Amsterdam: Elsevier.
Wright, D. (2013) Using corpora in forensic authorship analysis: Investigating idiolect in Enron emails. Corpus Linguistics. PhD Thesis, University of Lancaster.
Zheng, R., Li, J., Chen, H. and Huang, Z. (2006) A framework for authorship identification of online messages: writing-style features and classification techniques. Journal of the American Society for Information Science and Technology 57: 378–393. https://doi.org/10.1002/asi.20316