Machine learning based statistical inference in sports analytics

Robert Bajons, WU Vienna

Co-authors: Lucas Kook, WU Vienna

Abstract: Identifying which factors are predictive of an outcome (e.g., scoring a goal) in the presence of other features (e.g., position of the shooter) is a fundamental task in sports analytics. In practice, this task is commonly addressed using feature importance measures derived from machine learning algorithms. However, such algorithms typically come at the cost of limited interpretability and invalid statistical inference. Here, we achieve valid inference by using machine learning based nonparametric conditional independence tests to (i) determine strong shooters based on goals above expectation in soccer and (ii) determine the influence of statistically derived motion features on defensive coverage schemes in the NFL. We further relate these tests to a partially linear logistic regression model to facilitate interpretation.