Machine learning based statistical inference in sports analytics
Robert Bajons, WU Vienna
Co-authors: Lucas Kook, WU Vienna
Abstract: Identifying which factors are predictive of an outcome (e.g., scoring a goal) in the presence of other features (e.g., position of the shooter) is a fundamental task in sports analytics. In practice, this task is commonly addressed using feature importance measures derived from machine learning algorithms. However, such algorithms typically come at the cost of limited interpretability and invalid statistical inference. Here, we achieve valid inference by using machine learning based nonparametric conditional independence tests to (i) determine strong shooters based on goals above expectation in soccer and (ii) determine the influence of statistically derived motion features on defensive coverage schemes in the NFL. We further relate these tests to a partially linear logistic regression model to facilitate interpretation.