MOVING BLACK-BOXING TOWARDS STATISTICS: CASE STUDIES FROM AMERICAN FOOTBALL
Degree type
Graduate group
Discipline
Subject
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Contributor
Abstract
Over the past decade, the explosion of publicly available data and off-the-shelf machine learning (ML) tools has popularized a common data science workflow: (1) obtain a dataset, (2) fit a black-box ML model, and (3) use its predictions. This workflow has become even more streamlined with LLMs—just upload your dataset to ChatGPT, and it will fit a model without requiring any specification. This paradigm is especially prevalent in sports analytics. While the modern ML pipeline excels in data-rich environments, it struggles with challenges that statisticians traditionally consider, such as limited data, selection bias, strong dependency structures, and the need for uncertainty quantification. These challenges are pervasive in sports analytics. Hence, we propose a shift in emphasis across data science away from the typical black-box machine learning workflow and towards an emphasis on statistical thinking. We illustrate our proposed emphasis through case studies from American football: expected points, win probability, and NFL draft position value curves.