Exploring the Use of Machine Learning to Automate the Qualitative Coding of Church-related Tweets
DOI:
https://doi.org/10.1558/firn.40610Keywords:
digital theology, machine learning, sociology of religion, social media researchAbstract
This article builds on previous research around the exploration of the content of church-related tweets. It does so by exploring whether the qualitative thematic coding of such tweets can, in part, be automated by the use of machine learning. It compares three supervised machine learning algorithms to understand how useful each algorithm is at a classification task, based on a dataset of human-coded church-related tweets. The study finds that one such algorithm, Naïve-Bayes, performs better than the other algorithms considered, returning Precision, Recall and F-measure values which each exceed an acceptable threshold of 70%. This has far-reaching consequences at a time where the high volume of social media data, in this case, Twitter data, means that the resource-intensity of manual coding approaches can act as a barrier to understanding how the online community interacts with, and talks about, church. The findings presented in this article offer a way forward for scholars of digital theology to better understand the content of online church discourse.
Downloads
References
Bobkowski, Piotr S., and Lisa D. Pearce 2011 Baring their Souls in Online Profiles or Not? Religious Self-disclosure in Social Media. Journal for the Scientific Study of Religion 50(4): 744–62. https://doi.org/10.1111/j.1468-5906.2011.01597.x
Burgess, Regina L. 2013 Understanding Christian Blogger Motivations: Woe unto Me If I Blog Not the Gospel. Journal of Religion, Media and Digital Culture 2(2): 1–42. https://doi.org/10.1163/21659214-90000030
Campbell, Heidi 2012 Understanding the Relationship between Religion Online and Offline in a Networked Society. Journal of the American Academy of Religion 80(1): 64–93. https://doi.org/10.1093/jaarel/lfr074
Chen, Nan-Chen, Rafal Kocielnik, Margaret Drouhard, Vanessa Peña-Araya, Jina Suh, Keting Cen, Xiangyi Zheng and Cecilia R. Aragon. 2016 Challenges of Applying Machine Learning to Qualitative Coding. Paper presented at CHI 2016 Workshop on Human Centred Machine Learning, San Jose, CA, USA, 7–12 May 2016. http://chi2016.acm.org/wp/ https://doi.org/10.1109/pacificvis.2017.8031598
Cheng, Zhiyuan, James Caverlee and Kyumin Lee 2010 You Are Where You Tweet: A Content-Based Approach to Geo-locating Twitter Users. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management, Toronto, Ontario, Canada, 26–30 October 2010, 759–68. https://doi.org/10.1145/1871437.1871535
Codone, Susan 2014 Megachurch Pastor Twitter Activity: An Analysis of Rick Warren and Andy Stanley, Two of America’s Social Pastors. Journal of Religion, Media and Digital Culture 3(2): 1–32. https://doi.org/10.1163/21659214-90000050
Cooper, Anthony-Paul 2014 Unwrapping Camden’s Church Tweeters: A Small-scale Thematic Study of Twitter Data. In Social Media in Social Research: Blogs on Blurring the Boundaries, edited by K. Woodfield. eBook. London: NatCen Social Research.
Assessing the Possible Relationship between the Sentiment of Church-related Tweets and Church Growth. Studies in Religion/Sciences Religieuses 46(1): 37–49. https://doi.org/10.1177/0008429816664215
Using Geotagged Twitter Data to Uncover Hidden Church Populations. In The Desecularisation of the City: London’s Churches 1980 to the Present, edited by G. Goodhew and A. P. Cooper, 134–47. Abingdon: Routledge. https://doi.org/10.4324/9781351167765-6
Cooper, Anthony-Paul, Joshua Mann, Erkki Sutinen and Peter Phillips 2020 Understanding London’s Church Tweeters: A Content Analysis of Church-Related Tweets Posted from a Global City. Manuscript submitted for publication.
Crowston, Kevin, Xiaozhong Liu, Eileen E. Allen and Robert Heckman. 2010 Machine Learning and Rule-based Automated Coding of Qualitative Data. Paper presented at ASIST 2010, Pittsburgh, PA, USA, 22–27 October 2010.
Dann, Stephen 2010 Twitter Content Classification. First Monday 15(2).
Holmberg, Kim, Johan Bastubacka and Mike Thelwall 2016 @God Please Open Your Fridge! A Content Analysis of Twitter Messages to @God: Hopes, Humour, Spirituality, and Profanities. Journal of Religion, Media and Digital Culture 5(2): 339–55. https://doi.org/10.1163/21659214-90000085
Hutchings, Tim 2007 Creating Church Online: A Case-study Approach to Religious Experience. Studies in World Christianity 13(3): 243–60. https://doi.org/10.3366/swc.2007.13.3.243
Kaur, Gaganjot, and Amit Chhabra 2014 Improved J48 Classification Algorithm for the Prediction of Diabetes. International Journal of Computer Applications 98(22): 13–17. https://doi.org/10.5120/17314-7433
Kolog, Emmanuel Awuni 2018 Detecting Emotions in Students’ Generated Content: An Evaluation of EmoTect System. In Technology in Education. Innovative Solutions and Practices, edited by S. Cheung, J. Lam, K. Li, O. Au, W. Ma and W. Ho, 235–48. ICTE 2018: Communications in Computer and Information Science, Vol. 843. https://doi.org/10.1007/978-981-13-0008-0_22
Kolog, Emmanuel Awuni, Erkki Sutinen and Eeva Nygren 2016 Hackathon for Learning Digital Theology in Computer Science. Modern Education and Computer Science 6: 1–12. https://doi.org/10.5815/ijmecs.2016.06.01
Mitchell, Tom M. 1997 Machine Learning. New York: McGraw-Hill.
Naaman, Mor, Jeffrey Boase and Chih-Hui Lai 2010 Is It Really about Me? Message Content in Social Awareness Streams. Paper presented at CSCW 2010, Savannah, Georgia, USA, 6–10 February 2010. https://doi.org/10.1145/1718918.1718953
Scharkow, Michael 2011 Online Content Analysis Using Supervised Machine Learning—an Empirical Evaluation. Paper presented at International Communication Association (ICA) Conference 2011, Boston, USA, 20–30 May 2011.
Taheri, Sona, Musa Mammodov and A. M. Bagirov 2011 Improving Naive Bayes Classifier Using Conditional Probabilities. Paper 121 presented at 9th Australian Data Mining Conference, Ballarat, Victoria, Australia, 1–2 December 2011, 63–68.