The Promise and Peril of the Data Deluge for Historians

Authors

  • Gary N. Smith Pomona College

DOI:

https://doi.org/10.1558/jch.21156

Keywords:

Big Data, data mining, HARKing, dimension reduction

Abstract

Historical analyses are inevitably based on data – documents, fossils, drawings, oral traditions, artifacts, and more. Recently, historians have been urged to embrace the data deluge (Guldi and Armitage 2014) and teams are now systematically assembling large digital collections of historical data that can be used for rigorous statistical analysis (Slingerland and Sullivan 2017; Turchin et al. 2015; Whitehouse et al. 2019; Slingerland et al. 2018–2019). The promise of large, widely accessible databases is the opportunity for rigorous statistical testing of plausible historical models. The peril is the temptation to ransack these databases for heretofore unknown statistical patterns. Statisticians bearing algorithms are a poor substitute for expertise.

Author Biography

Gary N. Smith, Pomona College

Gary N. Smith is the Fletcher Jones Professor of Economics at Pomona College, Claremont, CA. Smith has a long history of research projects debunking dubious uses of data in statistical analysis. He is the author of eight textbooks, seven trade books, nearly 100 academic papers, and seven software programs on economics, finance and statistics. The AI Delusion (Oxford University Press, 2018), argues that, in the age of Big Data, the real danger is not that computers are smarter than us, but that we think computers are smarter than us and therefore trust computers to make important decisions for us. His most recent books are The 9 Pitfalls of Data Science (Oxford University Press, 2019, winner of the PROSE award for Excellence in Popular Science & Popular Mathematics) and The Phantom Pattern Problem: The Mirage of Big Data (Oxford University Press 2020), both co-authored with Jay Cordes.

References

Akaev A. A., V. I. Pantin, and A. E. Ayvazov. 2009. “Analiz dinamiki dvizheniya mirovogo ekonomicheskogo krizisa na osnove teorii tsiklov.” Doklad na Pervom 31 Rossiyskom ekonomicheskom kongresse, MGU im. M.V. Lomonosova [“Analysis of The Dynamics of Motion of the Global Economic Crisis on the Basis of the Theory of Cycles.” Paper presented at the First Russian Economic Congress, Moscow State University].

Ambasciano, L. 2017. “Exiting the Motel of the Mysteries? How Historiographical Floccinaucinihilipilification Is Affecting CSR 2.0.” In Religion Explained? The Cognitive Science of Religion after Twenty-Five Years, eds L. H. Martin and D. Wiebe,107–22. London and New York: Bloomsbury. https://doi.org/10.5040/9781350032491.ch-009 DOI: https://doi.org/10.5040/9781350032491.ch-009

Artigue, H. M. and G. Smith. 2019. “The Principal Problem with Principal Compon­ents Regression,” Cogent Mathematics & Statistics 6(1): 1622190. https://doi.org/10.1080/25742558.2019.1622190 DOI: https://doi.org/10.1080/25742558.2019.1622190

Babyak, M. A. 2004. “What You See May Not Be What You Get: A Brief, Nontechnical Introduction to Overfitting In Regression-Type Models.” Psychosomatic Medicine 66(3): 411–21. https://doi.org/10.1097/01.psy.0000127692.23278.a9 DOI: https://doi.org/10.1097/00006842-200405000-00021

Begoli, E. and J. Horsey. 2012. “Design Principles for Effective Knowledge Discovery From Big Data.” Software Architecture (WICSA) and European Conference on Software Architecture (ECSA), 2012 Joint Working IEEE/IFIP Conference. https://doi.org/10.1109/WICSA-ECSA.212.32 DOI: https://doi.org/10.1109/WICSA-ECSA.212.32

Calude, C. S. and G. Longo. 2017. “The Deluge of Spurious Correlations in Big Data.” Foundations of Science 22(3): 595–612. https://doi.org/10.1007/s10699-016-9489-4 DOI: https://doi.org/10.1007/s10699-016-9489-4

Chase-Dunn, C. and B. Podobnik. 1995. “The Next World War: World-System Cycles and Trends.” Journal of World-Systems Research 1(1): 1–47. https://doi.org/10.5195/jwsr.1995.40 DOI: https://doi.org/10.5195/JWSR.1995.39

Cios, K. J., W. Pedrycz, R. W. Swiniarski, and L. A. Kurgan. 2007. Data Mining: A Knowledge Discovery Approach. New York: Springer.

Elliott, Ralph Nelson. 1938. The Wave Principle, New York: Elliott.

Fayyad, U., G. Piatetsky-Shapiro, and P. Smyth. 1996. “From Data Mining To Knowledge Discovery in Databases.” AI Magazine 17(3): 37–54. https://doi.org/10.1609/aimag.v17i3.1230

Goldstein, J. 1988. Long Cycles: Prosperity and War in the Modern Age. New Haven, CT: Yale University Press.

Guldi, J. and D. Armitage. 2014. The History Manifesto. Cambridge: Cambridge University Press. https://www.cambridge.org/core/what-we-publish/open-access/the-history-manifesto https://doi.org/10.1017/9781139923880 DOI: https://doi.org/10.1017/9781139923880

Hendry, D. F. and H. M. Krolzig. 2001. Automatic Econometric Model Selection. London: Timberlake Consultants Press.

Hurvich, C. M. and C. L. Tsai. 1990. “The Impact of Model Selection on Inference in Linear Regression.” American Statistician 44(3): 214 –17. https://doi.org/10.2307/2685338 DOI: https://doi.org/10.2307/2685338

Hocking, R. R. 1976. “The Analysis and Selection of Variables in Linear Regression.” Biometrics 32(1): 1–49. https://doi.org/10.2307/2529336 DOI: https://doi.org/10.2307/2529336

Hotelling, H. 1933. “Analysis of a Complex of Statistical Variables into Principal Components.” Journal of Educational Psychology 24: 417–41, 498–520. https://psycnet.apa.org/doi/10.1037/h0071325 https://doi.org/10.1037/h0070888 DOI: https://doi.org/10.1037/h0070888

Hotelling, H. 1936. “Relations Between Two Sets of Variates.” Biometrika 28(3–4): 321–77. https://doi.org/10.2307/2333955 DOI: https://doi.org/10.1093/biomet/28.3-4.321

Hotelling, H. 1957. “The Relations of the Newer Multivariate Statistical Methods to Factor Analysis.” British Journal of Statistical Psychology 10(2): 69–79. https://doi.org/10.1111/j.2044-8317.1957.tb00179.x DOI: https://doi.org/10.1111/j.2044-8317.1957.tb00179.x

Kendall, M. G. 1957. A Course in Multivariate Analysis, London: Griffin.

Kohler, T. A. 2018. “Our Unfinished Agenda (What I Have Learned).” The SAA Arch­aeological Record 18(5): 37–42. http://onlinedigeditions.com/publication/?i=542220&article_id=3236418&view=articleBrowser

Kondratieff, N. D. 1925. The Major Economic Cycles (in Russian). Moscow. Translated and published in 1984 as The Long Wave Cycle. New York: Richardson & Snyder.

Kondratieff, N. D. and W. F. Stolper. 1935. “The Long Waves in Economic Life.” Review of Economic Statistics 17(6): 105–15. https://doi.org/10.2307/1928486 DOI: https://doi.org/10.2307/1928486

Mandel, E. 1980. Long Waves of Capitalist Development: The Marxist Interpretation. Based on The Marshall Lectures Given at the University of Cambridge, 1978. Cambridge: Cambridge University Press.

Mansfield, E. R., J. T. Webster, and R. F. Gunst. 1977. “An Analytic Variable Selection Tech­nique for Principal Component Regression.” Applied Statistics 26(1): 34–40. https://doi.org/10.2307/2346865 DOI: https://doi.org/10.2307/2346865

Modelski, G., and W. R. Thompson. 1996. Leading Sectors and World Powers: The Coevolution of Global Economics and Politics. Columbia: University of South Carolina Press.

Mosteller, F., and J. W. Tukey. 1977. Data Analysis and Regression: A Second Course in Statistics. Reading, MA: Addison-Wesley.

Pearson, K. 1901. “On Lines and Planes of Closest Fit to Systems of Points in Space.” Philosophical Magazine 2(11): 559–72. https://doi.org/10.1080/14786440109462720 DOI: https://doi.org/10.1080/14786440109462720

Quigley, C. 2012. “Kondratieff Waves and the Greater Depression of 2013–2020.” https://www.financialsense.com/contributors/christopher-quigley/kondratieff-waves-and-the-greater-depression-of-2013-2020. Accessed 10 March 2020.

Sagiroglu S. and D. Sinanc. 2013. “Big Data: A Review.” Collaboration Technologies and Systems (CTS), 2013 International Conference. https://doi.org/10.1109/CTS.2013.6567202 DOI: https://doi.org/10.1109/CTS.2013.6567202

Salum, F. and P. Vicente. 2017. “The Next Cycle of Capitalism.” INSEAD Knowledge. https://knowledge.insead.edu/strategy/the-next-cycle-of-capitalism-5226. Accessed 10 March 2020.

Skwarek, S. n.d. “Kondratieff Wave.” CMT Association. Retrieved 2018–12–20. https://cmtassociation.org/kb/kondratieff-wave/.

Slingerland, E. and B. Sullivan. 2017. “Durkheim with Data: The Database of Religious History.” Journal of the American Academy of Religion 85(2): 312–47. https://doi.org/10.1093/jaarel/lfw012 DOI: https://doi.org/10.1093/jaarel/lfw012

Slingerland, E., et al. 2019. ‘Complex Societies Precede Moralizing Gods Throughout World History’”. Journal of Cognitive Historiography 5(1–2): 124–41. https://doi.org/10.1558/jch.39393.

Slingerland, E. et al. 2018–2019. “Historians Respond to Whitehouse et al. (2019), ‘Complex Societies Precede Moralizing Gods Throughout World History’”. Journal of Cognitive Historiography 5(1-2): 124-41. https://doi.org/10.1558/jch.39393 DOI: https://doi.org/10.1558/jch.39393

Smith, G. 2018a. “Step Away From StepWise,” Journal of Big Data 5: 32. https://doi.org/10.1186/s40537-018-0143-6 DOI: https://doi.org/10.1186/s40537-018-0143-6

Smith, G. 2018b. The AI Delusion. Oxford: Oxford University Press. https://doi.org/10.1093/oso/9780198824305.001.0001 DOI: https://doi.org/10.1093/oso/9780198824305.001.0001

Smith, G. 2020. “Data Mining Fool’s Gold.”. Journal of Information Technology. https://doi.org/10.1177/0268396220915600 DOI: https://doi.org/10.1177/0268396220915600

Smith, G. and J. Cordes. 2019. The 9 Pitfalls of Data Science. Oxford: Oxford University Press. https://doi.org/10.1093/oso/9780198844396.001.0001 DOI: https://doi.org/10.1093/oso/9780198844396.001.0001

Smith, G. and J. Cordes. 2020. The Phantom Pattern Problem: The Mirage of Big Data. Oxford: Oxford University Press. https://doi.org/10.1093/oso/9780198864165.001.0001 DOI: https://doi.org/10.1093/oso/9780198864165.001.0001

Spinney, L. 2019. “History as a Giant Data Set: How Analysing the Past Could Help Save the Future.” The Guardian, 12 November. https://www.theguardian.com/technology/2019/nov/12/history-as-a-giant-data-set-how-analysing-the-past-could-help-save-the-future. Accessed 10 March 2020.

Stevenson, P. W. 2016. “Professor Who Predicted 30 Years of Presidential Elections Correctly Called a Trump Win in September.” The Washington Post, 8 November.

Thompson B. 1995. “Stepwise Regression and Stepwise Discriminant Analysis Need Not Apply Here: A Guidelines Editorial.” Educational and Psychological Measurement 55: 525–34. https://doi.org/10.1177%2F0013164495055004001 DOI: https://doi.org/10.1177/0013164495055004001

Tosh, N., J. Ferguson, and C. Seoighe. 2018. “History by the numbers?” Proceedings of the National Academy of Sciences 115(26): E5840. www.pnas.org/cgi/doi/10.1073/pnas.1807023115. https://doi.org/10.1073/pnas.1807023115 DOI: https://doi.org/10.1073/pnas.1807023115

Turchin, P. 2003. Historical Dynamics: Why States Rise and Fall. Princeton and Oxford: Princeton University Press. https://doi.org/10.1515/9781400889310 DOI: https://doi.org/10.1515/9781400889310

Turchin, P., et al. 2015. Seshat: The Global History Databank. Cliodynamics 6(1): 77–107. https://doi.org/10.21237/C7clio6127917 DOI: https://doi.org/10.21237/C7CLIO6127917

Turchin, P., et al. 2018. “Quantitative Historical Analysis Uncovers a Single Dimension of Complexity that Structures Global Variation in Human Social Organization.” Proceedings of the National Academy of Sciences 115(2): E144- E144-E151. https://doi.org/10.1073/pnas.1708800115 DOI: https://doi.org/10.1073/pnas.1708800115

Whitehouse, H. et al. 2019. “Complex Societies Precede Moralizing Gods Throughout World History.” Nature 568: 226–29. https://doi.org/10.1038/s41586-019-1043-4. DOI: https://doi.org/10.1038/s41586-019-1043-4

Published

2022-01-06

How to Cite

Smith, G. N. . (2022). The Promise and Peril of the Data Deluge for Historians. Journal of Cognitive Historiography, 6(1-2), 277–287. https://doi.org/10.1558/jch.21156

Issue

Section

Commentary