Analyzing Baseball Umpires Using Machine Learning: A Statistical Learning Approach

Loading...
Thumbnail Image

Authors

Brown, Sean-Paul

Date of Issue

2026-04-29

Type

Thesis

Language

en_US

Subject Keywords

Research Subject Categories::MATHEMATICS::Applied mathematics::Mathematical statistics , Research Subject Categories::SOCIAL SCIENCES::Statistics, computer and systems science::Informatics, computer and systems science::Data processing

Research Projects

Organizational Units

Journal Issue

Other Titles

Development and Analysis of Baseball Statistics

Abstract

This project develops a probability-based grading system for Major League Baseball umpires using Statcast pitch-level data. Traditional evaluations rely on simple accuracy of umpires’ call correctness, which fails to account for the difficulty of judging high-velocity, high-spin pitches relative to an invisible strike zone. This study employs XGBoost and Logistic Regression models to estimate the probability that a pitch is called correctly, using engineered features such as the distance from the closest edge of the strike zone and pitch velocity. Hyperparameters were optimized via Bayesian search to minimize log-loss, resulting in well-calibrated probability estimates. The predicted probabilities were incorporated into an exponential payoff function, P(η, c), where η represents the ease of a call, derived from the logodds of a correct call, and c denotes correctness. This formulation rewards umpires more for successfully calling difficult pitches and penalizes errors more on easier pitches. This produces a nuanced measure of performance beyond binary accuracy. The XGBoost model achieved a log-loss of 0.254, AUC of 0.875, Brier Score of 0.081, and 88.1% accuracy on the testing set, demonstrating strong predictive capability. Results indicate that incorporating difficulty-adjusted grading better differentiates umpire performance, providing a robust, quantitative framework for evaluating accuracy in high-pressure, real-world conditions.

Description

Citation

Publisher

License

Journal

Volume

Issue

PubMed ID

DOI

ISSN

EISSN