A Hierarchical Bayesian Variable Selection Approach to Major League Baseball Hitting Metrics

Loading...
Thumbnail Image
Penn collection
Statistics Papers
Degree type
Discipline
Subject
baseball
hierarchical
Bayesian
mixture
model
random effects
variable selection
Statistics and Probability
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
McShane, Blakeley B
Braunstein, Alexander
Piette, James M
Jensen, Shane T
Contributor
Abstract

Numerous statistics have been proposed to measure offensive ability in Major League Baseball. While some of these measures may offer moderate predictive power in certain situations, it is unclear which simple offensive metrics are the most reliable or consistent. We address this issue by using a hierarchical Bayesian variable selection model to determine which offensive metrics are most predictive within players across time. Our sophisticated methodology allows for full estimation of the posterior distributions for our parameters and automatically adjusts for multiple testing, providing a distinct advantage over alternative approaches. We implement our model on a set of fifty different offensive metrics and discuss our results in the context of comparison to other variable selection techniques. We find that a large number of metrics demonstrate signal. However, these metrics are (i) highly correlated with one another, (ii) can be reduced to about five without much loss of information, and (iii) these five relate to traditional notions of performance (e.g., plate discipline, power, and ability to make contact).

Advisor
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Publication date
2011-10-01
Journal title
Journal of Quantitative Analysis in Sports
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation
Collection