Research on Statistical Modelling for Sports Predictions

List of Publications

Mark J. Dixon, Stuart G. Coles (1996)

Modelling Association Football Scores

Abstract: A parametric model is developed and fitted to English league and cup football data from 1992 to 1995. The model is motivated by an aim to exploit potential inefficiencies in the association football betting market, and this is examined using bookmakers odds from 1995 to 1996. The technique is based on a Poisson regression model but is complicated by the data structure and the dynamic nature of teams performances. Maximum likelihood estimates are shown to be computationally obtainable, and the model is shown to have a positive return when used as the basis of a betting strategy.


Kenneth Massey (1997)

Statistical Models Applied to the Rating of Sports Teams

Abstract: One of the most intriguing aspects of sports is that it thrives on controversy. Fans, the media, and even players continually argue the issue of which team is best, a question that can ultimately be resolved only by playing the game. Or can it? It is likely that a significant portion of sporting results could be regarded as flukes. Simply put, the superior team does not always win. Therefore even a playoff, although it may determine a champion, will not necessarily end all disagreement as to which team is actually the best.


Leonard Knorr-Held (1999)

Dynamic Rating of Sports Teams

Abstract: We consider the problem of dynamically rating sports teams on the basis of categorical outcomes of paired comparisons such as win, draw and loss in football. Our modelling framework is the cumulative link model for ordered responses, where latent parameters represent the strength of each team. A dynamic extension of this model is proposed with close connections to nonparametric smoothing methods. As a consequence, recent results have more influence in estimating current abilities than results in the past. We highlight the importance of using a specific constrained random walk prior for time-changing abilities which guarantees an equal treatment of all teams. Estimation is done with an extended Kalman filter and smoother algorithm. An additional hyperparameter which determines the temporal dynamic of the latent team abilities is chosen on the basis of the optimal one-step-ahead predictive power. Alternative estimation methods are also considered. We apply our method to the results from the German football league Bundesliga 1996-1997 and to the results from the American National Basketball Association 1996-1997.


Emonet Benoit (2000)

Revisiting Statistical Applications in Soccer

Abstract: The present report results from a project taking part of the `Science, Technique and Society` cursus. These projects have to be done during the undergraduate studies in the Department of Mathematics at the Swiss Federal Institute of Technology (EPFL). Our main goal was to review the statistical work related to soccer throughout articles published in statistical journals; see the references for an extensive list. To do so we merged them together in order to give an overall view of the different investigations.


Dimitris Karlis, Ioannis Ntzoufras (2003)

Analysis of Sports Data Using Bivariate Poisson Models

Abstract: Models based on the bivariate Poisson distribution are used for modelling sports data. Independent Poisson distributions are usually adopted to model the number of goals of two competing teams. We replace the independence assumption by considering a bivariate Poisson model and its extensions. The models proposed allow for correlation between the two scores, which is a plausible assumption in sports with two opposing teams competing against each other. The effect of introducing even slight correlation is discussed. Using just a bivariate Poisson distribution can improve model fit and prediction of the number of draws in football games. The model is extended by considering an inflation factor for diagonal terms in the bivariate joint distribution. This inflation improves in precision the estimation of draws and, at the same time, allows for overdispersed, relative to the simple Poisson distribution, marginal distributions. The properties of the models proposed as well as interpretation and estimation procedures are provided. An illustration of the models is presented by using data sets from football and water-polo.


James R. Ashburn, Paul M. Colvert (2006)

A Bayesian Mean-Value Approach for the Ranking of Football Teams

Abstract: We introduce a Bayesian mean-value approach for ranking all college football teams using only win-loss data. This approach is unique in that the prior distribution necessary to handle undefeated and winless teams is calculated self-consistently. Furthermore, we will show statistics supporting the validity of the prior distribution. Finally, a brief comparison with other football rankings will be presented.


Dimitris Karlis, Ioannis Ntzoufras (2007)

Bayesian Modelling of Football Outcomes Using the Skellam Distribution

Abstract: Modelling football match outcomes is becoming increasingly popular nowadays for both team managers and betting funs. Most of the existing literature deals with modelling the number of goals scored by each team. In the present paper we work in a different direction. Instead of modelling the number of goals directly, we focus on the difference of the number of goals, i.e. the margin of victory. We recast interest in the so-called Skellam distribution. Modelling the differences instead of the scores themselves has some major advantages. Firstly, we eliminate correlation imposed by the fact that the two opponent teams compete each other and secondly we do not assume that the scored goals by each team are marginally Poisson distributed. Application of the Bayesian methodology for the Skellam distribution using covariates is discussed. Illustrations using real data from the English Premiership for the season 2006-2007 are provided. The advantages of the proposed approach are also discussed.