The Promise and Peril of the Data Deluge for Historians
DOI:
https://doi.org/10.1558/jch.21156Keywords:
Big Data, data mining, HARKing, dimension reductionAbstract
Historical analyses are inevitably based on data – documents, fossils, drawings, oral traditions, artifacts, and more. Recently, historians have been urged to embrace the data deluge (Guldi and Armitage 2014) and teams are now systematically assembling large digital collections of historical data that can be used for rigorous statistical analysis (Slingerland and Sullivan 2017; Turchin et al. 2015; Whitehouse et al. 2019; Slingerland et al. 2018–2019). The promise of large, widely accessible databases is the opportunity for rigorous statistical testing of plausible historical models. The peril is the temptation to ransack these databases for heretofore unknown statistical patterns. Statisticians bearing algorithms are a poor substitute for expertise.
References
Akaev A. A., V. I. Pantin, and A. E. Ayvazov. 2009. “Analiz dinamiki dvizheniya mirovogo ekonomicheskogo krizisa na osnove teorii tsiklov.” Doklad na Pervom 31 Rossiyskom ekonomicheskom kongresse, MGU im. M.V. Lomonosova [“Analysis of The Dynamics of Motion of the Global Economic Crisis on the Basis of the Theory of Cycles.” Paper presented at the First Russian Economic Congress, Moscow State University].
Ambasciano, L. 2017. “Exiting the Motel of the Mysteries? How Historiographical Floccinaucinihilipilification Is Affecting CSR 2.0.” In Religion Explained? The Cognitive Science of Religion after Twenty-Five Years, eds L. H. Martin and D. Wiebe,107–22. London and New York: Bloomsbury. https://doi.org/10.5040/9781350032491.ch-009
Artigue, H. M. and G. Smith. 2019. “The Principal Problem with Principal Components Regression,” Cogent Mathematics & Statistics 6(1): 1622190. https://doi.org/10.1080/25742558.2019.1622190
Babyak, M. A. 2004. “What You See May Not Be What You Get: A Brief, Nontechnical Introduction to Overfitting In Regression-Type Models.” Psychosomatic Medicine 66(3): 411–21. https://doi.org/10.1097/01.psy.0000127692.23278.a9
Begoli, E. and J. Horsey. 2012. “Design Principles for Effective Knowledge Discovery From Big Data.” Software Architecture (WICSA) and European Conference on Software Architecture (ECSA), 2012 Joint Working IEEE/IFIP Conference. https://doi.org/10.1109/WICSA-ECSA.212.32
Calude, C. S. and G. Longo. 2017. “The Deluge of Spurious Correlations in Big Data.” Foundations of Science 22(3): 595–612. https://doi.org/10.1007/s10699-016-9489-4
Chase-Dunn, C. and B. Podobnik. 1995. “The Next World War: World-System Cycles and Trends.” Journal of World-Systems Research 1(1): 1–47. https://doi.org/10.5195/jwsr.1995.40
Cios, K. J., W. Pedrycz, R. W. Swiniarski, and L. A. Kurgan. 2007. Data Mining: A Knowledge Discovery Approach. New York: Springer.
Elliott, Ralph Nelson. 1938. The Wave Principle, New York: Elliott.
Fayyad, U., G. Piatetsky-Shapiro, and P. Smyth. 1996. “From Data Mining To Knowledge Discovery in Databases.” AI Magazine 17(3): 37–54. https://doi.org/10.1609/aimag.v17i3.1230
Goldstein, J. 1988. Long Cycles: Prosperity and War in the Modern Age. New Haven, CT: Yale University Press.
Guldi, J. and D. Armitage. 2014. The History Manifesto. Cambridge: Cambridge University Press. https://www.cambridge.org/core/what-we-publish/open-access/the-history-manifesto https://doi.org/10.1017/9781139923880
Hendry, D. F. and H. M. Krolzig. 2001. Automatic Econometric Model Selection. London: Timberlake Consultants Press.
Hurvich, C. M. and C. L. Tsai. 1990. “The Impact of Model Selection on Inference in Linear Regression.” American Statistician 44(3): 214 –17. https://doi.org/10.2307/2685338
Hocking, R. R. 1976. “The Analysis and Selection of Variables in Linear Regression.” Biometrics 32(1): 1–49. https://doi.org/10.2307/2529336
Hotelling, H. 1933. “Analysis of a Complex of Statistical Variables into Principal Components.” Journal of Educational Psychology 24: 417–41, 498–520. https://psycnet.apa.org/doi/10.1037/h0071325 https://doi.org/10.1037/h0070888
Hotelling, H. 1936. “Relations Between Two Sets of Variates.” Biometrika 28(3–4): 321–77. https://doi.org/10.2307/2333955
Hotelling, H. 1957. “The Relations of the Newer Multivariate Statistical Methods to Factor Analysis.” British Journal of Statistical Psychology 10(2): 69–79. https://doi.org/10.1111/j.2044-8317.1957.tb00179.x
Kendall, M. G. 1957. A Course in Multivariate Analysis, London: Griffin.
Kohler, T. A. 2018. “Our Unfinished Agenda (What I Have Learned).” The SAA Archaeological Record 18(5): 37–42. http://onlinedigeditions.com/publication/?i=542220&article_id=3236418&view=articleBrowser
Kondratieff, N. D. 1925. The Major Economic Cycles (in Russian). Moscow. Translated and published in 1984 as The Long Wave Cycle. New York: Richardson & Snyder.
Kondratieff, N. D. and W. F. Stolper. 1935. “The Long Waves in Economic Life.” Review of Economic Statistics 17(6): 105–15. https://doi.org/10.2307/1928486
Mandel, E. 1980. Long Waves of Capitalist Development: The Marxist Interpretation. Based on The Marshall Lectures Given at the University of Cambridge, 1978. Cambridge: Cambridge University Press.
Mansfield, E. R., J. T. Webster, and R. F. Gunst. 1977. “An Analytic Variable Selection Technique for Principal Component Regression.” Applied Statistics 26(1): 34–40. https://doi.org/10.2307/2346865
Modelski, G., and W. R. Thompson. 1996. Leading Sectors and World Powers: The Coevolution of Global Economics and Politics. Columbia: University of South Carolina Press.
Mosteller, F., and J. W. Tukey. 1977. Data Analysis and Regression: A Second Course in Statistics. Reading, MA: Addison-Wesley.
Pearson, K. 1901. “On Lines and Planes of Closest Fit to Systems of Points in Space.” Philosophical Magazine 2(11): 559–72. https://doi.org/10.1080/14786440109462720
Quigley, C. 2012. “Kondratieff Waves and the Greater Depression of 2013–2020.” https://www.financialsense.com/contributors/christopher-quigley/kondratieff-waves-and-the-greater-depression-of-2013-2020. Accessed 10 March 2020.
Sagiroglu S. and D. Sinanc. 2013. “Big Data: A Review.” Collaboration Technologies and Systems (CTS), 2013 International Conference. https://doi.org/10.1109/CTS.2013.6567202
Salum, F. and P. Vicente. 2017. “The Next Cycle of Capitalism.” INSEAD Knowledge. https://knowledge.insead.edu/strategy/the-next-cycle-of-capitalism-5226. Accessed 10 March 2020.
Skwarek, S. n.d. “Kondratieff Wave.” CMT Association. Retrieved 2018–12–20. https://cmtassociation.org/kb/kondratieff-wave/.
Slingerland, E. and B. Sullivan. 2017. “Durkheim with Data: The Database of Religious History.” Journal of the American Academy of Religion 85(2): 312–47. https://doi.org/10.1093/jaarel/lfw012
Slingerland, E., et al. 2019. ‘Complex Societies Precede Moralizing Gods Throughout World History’”. Journal of Cognitive Historiography 5(1–2): 124–41. https://doi.org/10.1558/jch.39393.
Slingerland, E. et al. 2018–2019. “Historians Respond to Whitehouse et al. (2019), ‘Complex Societies Precede Moralizing Gods Throughout World History’”. Journal of Cognitive Historiography 5(1-2): 124-41. https://doi.org/10.1558/jch.39393
Smith, G. 2018a. “Step Away From StepWise,” Journal of Big Data 5: 32. https://doi.org/10.1186/s40537-018-0143-6
Smith, G. 2018b. The AI Delusion. Oxford: Oxford University Press. https://doi.org/10.1093/oso/9780198824305.001.0001
Smith, G. 2020. “Data Mining Fool’s Gold.”. Journal of Information Technology. https://doi.org/10.1177/0268396220915600
Smith, G. and J. Cordes. 2019. The 9 Pitfalls of Data Science. Oxford: Oxford University Press. https://doi.org/10.1093/oso/9780198844396.001.0001
Smith, G. and J. Cordes. 2020. The Phantom Pattern Problem: The Mirage of Big Data. Oxford: Oxford University Press. https://doi.org/10.1093/oso/9780198864165.001.0001
Spinney, L. 2019. “History as a Giant Data Set: How Analysing the Past Could Help Save the Future.” The Guardian, 12 November. https://www.theguardian.com/technology/2019/nov/12/history-as-a-giant-data-set-how-analysing-the-past-could-help-save-the-future. Accessed 10 March 2020.
Stevenson, P. W. 2016. “Professor Who Predicted 30 Years of Presidential Elections Correctly Called a Trump Win in September.” The Washington Post, 8 November.
Thompson B. 1995. “Stepwise Regression and Stepwise Discriminant Analysis Need Not Apply Here: A Guidelines Editorial.” Educational and Psychological Measurement 55: 525–34. https://doi.org/10.1177%2F0013164495055004001
Tosh, N., J. Ferguson, and C. Seoighe. 2018. “History by the numbers?” Proceedings of the National Academy of Sciences 115(26): E5840. www.pnas.org/cgi/doi/10.1073/pnas.1807023115. https://doi.org/10.1073/pnas.1807023115
Turchin, P. 2003. Historical Dynamics: Why States Rise and Fall. Princeton and Oxford: Princeton University Press. https://doi.org/10.1515/9781400889310
Turchin, P., et al. 2015. Seshat: The Global History Databank. Cliodynamics 6(1): 77–107. https://doi.org/10.21237/C7clio6127917
Turchin, P., et al. 2018. “Quantitative Historical Analysis Uncovers a Single Dimension of Complexity that Structures Global Variation in Human Social Organization.” Proceedings of the National Academy of Sciences 115(2): E144- E144-E151. https://doi.org/10.1073/pnas.1708800115
Whitehouse, H. et al. 2019. “Complex Societies Precede Moralizing Gods Throughout World History.” Nature 568: 226–29. https://doi.org/10.1038/s41586-019-1043-4.