MOVING BLACK-BOXING TOWARDS STATISTICS: CASE STUDIES FROM AMERICAN FOOTBALL

Brill, Ryan

MOVING BLACK-BOXING TOWARDS STATISTICS: CASE STUDIES FROM AMERICAN FOOTBALL

Files

Brill_upenngdas_0175C_16862.pdf (18.87 MB)

Degree type

Doctor of Philosophy (PhD)

Graduate group

Applied Mathematics and Computational Science

Discipline

Statistics and Probability

Copyright date

2025

Permalink

https://repository.upenn.edu/handle/20.500.14332/61229

View all metadata

Author

Brill, Ryan

Abstract

Over the past decade, the explosion of publicly available data and off-the-shelf machine learning (ML) tools has popularized a common data science workflow: (1) obtain a dataset, (2) fit a black-box ML model, and (3) use its predictions. This workflow has become even more streamlined with LLMs—just upload your dataset to ChatGPT, and it will fit a model without requiring any specification. This paradigm is especially prevalent in sports analytics. While the modern ML pipeline excels in data-rich environments, it struggles with challenges that statisticians traditionally consider, such as limited data, selection bias, strong dependency structures, and the need for uncertainty quantification. These challenges are pervasive in sports analytics. Hence, we propose a shift in emphasis across data science away from the typical black-box machine learning workflow and towards an emphasis on statistical thinking. We illustrate our proposed emphasis through case studies from American football: expected points, win probability, and NFL draft position value curves.

Advisor

Wyner, Abraham, J

Date of degree

2025

Collection

Dissertations and Theses