The Promise and Peril of the Data Deluge for Historians


  • Gary N. Smith Pomona College



Big Data, data mining, HARKing, dimension reduction


Historical analyses are inevitably based on data – documents, fossils, drawings, oral traditions, artifacts, and more. Recently, historians have been urged to embrace the data deluge (Guldi and Armitage 2014) and teams are now systematically assembling large digital collections of historical data that can be used for rigorous statistical analysis (Slingerland and Sullivan 2017; Turchin et al. 2015; Whitehouse et al. 2019; Slingerland et al. 2018–2019). The promise of large, widely accessible databases is the opportunity for rigorous statistical testing of plausible historical models. The peril is the temptation to ransack these databases for heretofore unknown statistical patterns. Statisticians bearing algorithms are a poor substitute for expertise.

Author Biography

Gary N. Smith, Pomona College

Gary N. Smith is the Fletcher Jones Professor of Economics at Pomona College, Claremont, CA. Smith has a long history of research projects debunking dubious uses of data in statistical analysis. He is the author of eight textbooks, seven trade books, nearly 100 academic papers, and seven software programs on economics, finance and statistics. The AI Delusion (Oxford University Press, 2018), argues that, in the age of Big Data, the real danger is not that computers are smarter than us, but that we think computers are smarter than us and therefore trust computers to make important decisions for us. His most recent books are The 9 Pitfalls of Data Science (Oxford University Press, 2019, winner of the PROSE award for Excellence in Popular Science & Popular Mathematics) and The Phantom Pattern Problem: The Mirage of Big Data (Oxford University Press 2020), both co-authored with Jay Cordes.


Akaev A. A., V. I. Pantin, and A. E. Ayvazov. 2009. “Analiz dinamiki dvizheniya mirovogo ekonomicheskogo krizisa na osnove teorii tsiklov.” Doklad na Pervom 31 Rossiyskom ekonomicheskom kongresse, MGU im. M.V. Lomonosova [“Analysis of The Dynamics of Motion of the Global Economic Crisis on the Basis of the Theory of Cycles.” Paper presented at the First Russian Economic Congress, Moscow State University].

Ambasciano, L. 2017. “Exiting the Motel of the Mysteries? How Historiographical Floccinaucinihilipilification Is Affecting CSR 2.0.” In Religion Explained? The Cognitive Science of Religion after Twenty-Five Years, eds L. H. Martin and D. Wiebe,107–22. London and New York: Bloomsbury. DOI:

Artigue, H. M. and G. Smith. 2019. “The Principal Problem with Principal Compon­ents Regression,” Cogent Mathematics & Statistics 6(1): 1622190. DOI:

Babyak, M. A. 2004. “What You See May Not Be What You Get: A Brief, Nontechnical Introduction to Overfitting In Regression-Type Models.” Psychosomatic Medicine 66(3): 411–21. DOI:

Begoli, E. and J. Horsey. 2012. “Design Principles for Effective Knowledge Discovery From Big Data.” Software Architecture (WICSA) and European Conference on Software Architecture (ECSA), 2012 Joint Working IEEE/IFIP Conference. DOI:

Calude, C. S. and G. Longo. 2017. “The Deluge of Spurious Correlations in Big Data.” Foundations of Science 22(3): 595–612. DOI:

Chase-Dunn, C. and B. Podobnik. 1995. “The Next World War: World-System Cycles and Trends.” Journal of World-Systems Research 1(1): 1–47. DOI:

Cios, K. J., W. Pedrycz, R. W. Swiniarski, and L. A. Kurgan. 2007. Data Mining: A Knowledge Discovery Approach. New York: Springer.

Elliott, Ralph Nelson. 1938. The Wave Principle, New York: Elliott.

Fayyad, U., G. Piatetsky-Shapiro, and P. Smyth. 1996. “From Data Mining To Knowledge Discovery in Databases.” AI Magazine 17(3): 37–54.

Goldstein, J. 1988. Long Cycles: Prosperity and War in the Modern Age. New Haven, CT: Yale University Press.

Guldi, J. and D. Armitage. 2014. The History Manifesto. Cambridge: Cambridge University Press. DOI:

Hendry, D. F. and H. M. Krolzig. 2001. Automatic Econometric Model Selection. London: Timberlake Consultants Press.

Hurvich, C. M. and C. L. Tsai. 1990. “The Impact of Model Selection on Inference in Linear Regression.” American Statistician 44(3): 214 –17. DOI:

Hocking, R. R. 1976. “The Analysis and Selection of Variables in Linear Regression.” Biometrics 32(1): 1–49. DOI:

Hotelling, H. 1933. “Analysis of a Complex of Statistical Variables into Principal Components.” Journal of Educational Psychology 24: 417–41, 498–520. DOI:

Hotelling, H. 1936. “Relations Between Two Sets of Variates.” Biometrika 28(3–4): 321–77. DOI:

Hotelling, H. 1957. “The Relations of the Newer Multivariate Statistical Methods to Factor Analysis.” British Journal of Statistical Psychology 10(2): 69–79. DOI:

Kendall, M. G. 1957. A Course in Multivariate Analysis, London: Griffin.

Kohler, T. A. 2018. “Our Unfinished Agenda (What I Have Learned).” The SAA Arch­aeological Record 18(5): 37–42.

Kondratieff, N. D. 1925. The Major Economic Cycles (in Russian). Moscow. Translated and published in 1984 as The Long Wave Cycle. New York: Richardson & Snyder.

Kondratieff, N. D. and W. F. Stolper. 1935. “The Long Waves in Economic Life.” Review of Economic Statistics 17(6): 105–15. DOI:

Mandel, E. 1980. Long Waves of Capitalist Development: The Marxist Interpretation. Based on The Marshall Lectures Given at the University of Cambridge, 1978. Cambridge: Cambridge University Press.

Mansfield, E. R., J. T. Webster, and R. F. Gunst. 1977. “An Analytic Variable Selection Tech­nique for Principal Component Regression.” Applied Statistics 26(1): 34–40. DOI:

Modelski, G., and W. R. Thompson. 1996. Leading Sectors and World Powers: The Coevolution of Global Economics and Politics. Columbia: University of South Carolina Press.

Mosteller, F., and J. W. Tukey. 1977. Data Analysis and Regression: A Second Course in Statistics. Reading, MA: Addison-Wesley.

Pearson, K. 1901. “On Lines and Planes of Closest Fit to Systems of Points in Space.” Philosophical Magazine 2(11): 559–72. DOI:

Quigley, C. 2012. “Kondratieff Waves and the Greater Depression of 2013–2020.” Accessed 10 March 2020.

Sagiroglu S. and D. Sinanc. 2013. “Big Data: A Review.” Collaboration Technologies and Systems (CTS), 2013 International Conference. DOI:

Salum, F. and P. Vicente. 2017. “The Next Cycle of Capitalism.” INSEAD Knowledge. Accessed 10 March 2020.

Skwarek, S. n.d. “Kondratieff Wave.” CMT Association. Retrieved 2018–12–20.

Slingerland, E. and B. Sullivan. 2017. “Durkheim with Data: The Database of Religious History.” Journal of the American Academy of Religion 85(2): 312–47. DOI:

Slingerland, E., et al. 2019. ‘Complex Societies Precede Moralizing Gods Throughout World History’”. Journal of Cognitive Historiography 5(1–2): 124–41.

Slingerland, E. et al. 2018–2019. “Historians Respond to Whitehouse et al. (2019), ‘Complex Societies Precede Moralizing Gods Throughout World History’”. Journal of Cognitive Historiography 5(1-2): 124-41. DOI:

Smith, G. 2018a. “Step Away From StepWise,” Journal of Big Data 5: 32. DOI:

Smith, G. 2018b. The AI Delusion. Oxford: Oxford University Press. DOI:

Smith, G. 2020. “Data Mining Fool’s Gold.”. Journal of Information Technology. DOI:

Smith, G. and J. Cordes. 2019. The 9 Pitfalls of Data Science. Oxford: Oxford University Press. DOI:

Smith, G. and J. Cordes. 2020. The Phantom Pattern Problem: The Mirage of Big Data. Oxford: Oxford University Press. DOI:

Spinney, L. 2019. “History as a Giant Data Set: How Analysing the Past Could Help Save the Future.” The Guardian, 12 November. Accessed 10 March 2020.

Stevenson, P. W. 2016. “Professor Who Predicted 30 Years of Presidential Elections Correctly Called a Trump Win in September.” The Washington Post, 8 November.

Thompson B. 1995. “Stepwise Regression and Stepwise Discriminant Analysis Need Not Apply Here: A Guidelines Editorial.” Educational and Psychological Measurement 55: 525–34. DOI:

Tosh, N., J. Ferguson, and C. Seoighe. 2018. “History by the numbers?” Proceedings of the National Academy of Sciences 115(26): E5840. DOI:

Turchin, P. 2003. Historical Dynamics: Why States Rise and Fall. Princeton and Oxford: Princeton University Press. DOI:

Turchin, P., et al. 2015. Seshat: The Global History Databank. Cliodynamics 6(1): 77–107. DOI:

Turchin, P., et al. 2018. “Quantitative Historical Analysis Uncovers a Single Dimension of Complexity that Structures Global Variation in Human Social Organization.” Proceedings of the National Academy of Sciences 115(2): E144- E144-E151. DOI:

Whitehouse, H. et al. 2019. “Complex Societies Precede Moralizing Gods Throughout World History.” Nature 568: 226–29. DOI:



How to Cite

Smith, G. N. . (2022). The Promise and Peril of the Data Deluge for Historians. Journal of Cognitive Historiography, 6(1-2), 277–287.