MOVING BLACK-BOXING TOWARDS STATISTICS: CASE STUDIES FROM AMERICAN FOOTBALL

Loading...
Thumbnail Image
Degree type
PhD
Graduate group
Applied Mathematics and Computational Science
Discipline
Statistics and Probability
Subject
Funder
Grant number
License
Copyright date
01/01/2025
Distributor
Related resources
Author
Brill, Ryan
Contributor
Abstract

Over the past decade, the explosion of publicly available data and off-the-shelf machine learning (ML) tools has popularized a common data science workflow: (1) obtain a dataset, (2) fit a black-box ML model, and (3) use its predictions. This workflow has become even more streamlined with LLMs—just upload your dataset to ChatGPT, and it will fit a model without requiring any specification. This paradigm is especially prevalent in sports analytics. While the modern ML pipeline excels in data-rich environments, it struggles with challenges that statisticians traditionally consider, such as limited data, selection bias, strong dependency structures, and the need for uncertainty quantification. These challenges are pervasive in sports analytics. Hence, we propose a shift in emphasis across data science away from the typical black-box machine learning workflow and towards an emphasis on statistical thinking. We illustrate our proposed emphasis through case studies from American football: expected points, win probability, and NFL draft position value curves.

Advisor
Wyner, Abraham, J
Date of degree
2025
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation