The Premier League season is coming around again and I am collecting predictions this week. Please check my Twitter timeline for details (@simongleave) or contact me there. Jan Mullenberg (@nacstatistieken), with assistance from Richard Lochten, has kindly written the blog below on last season’s predictions.
Over the last three years, Simon Gleave has been gathering data for an interesting study. In this study a large number of respondents (consisting of football fans with different backgrounds) submit their pre-season expected points of the Premier League standings. Predictions of 38 league games based on knowledge, emotions and numbers. For clarity, we have reduced last season’s group of 91 respondents to four smaller groups, namely fans (n=29), statistical models (n=42), media (n=16), and bookmakers (n=4).
First of all, some facts from the predictions from 2015/2016:
– Chelsea was generally predicted to win the Premier League.
– According to general expectations Sunderland, Norwich and Watford should have been relegated to the Football League Championship.
– There was a real dichotomy in Leicester City’s pre-season predictions: statistical models and bookies expected more from Leicester City (5.8 more points) than fans and media.
– Modellers had a slightly lower expected outcome for Spurs compared to other groups.
– Journalists were more positive about the chances of Crystal Palace.
Correlation per matchday
In this research we used the Pearson correlation coefficient in order to test the predictions. Correlation measures to which extent variables have an association with each other, in this particular case the relationship between expected points from pre-season predictions and actual points at end of the season. The closer the correlation is to one, the better the predictions of the groups. The correlation coefficient always needs to have a few rounds of matches to stabilise and become an accurate measure but it usually stays at the level it reaches after about nine weeks. Last season, across the whole group of predictions it reached its maximum (0.61) after nine matches as we would expect. However, after that game round the correlation declined, to a minimum of 0.36 after 17 matches, just before the halfway point of the league season.
The freefall between round 9 and 17 was mainly due to the underestimation of Leicester City and newly promoted Watford. On the other hand, the overestimation of Chelsea strongly influenced the correlation coefficient at this stage of the season too. After round 17 the correlation increased, but it never managed to reach the peak it reached after nine matches again. After the final matchday the four groups of predictors averaged a correlation of just 0.51, a lot lower than we have seen in other seasons. This is an indication of how unpredictable last season was.
The bookmakers (M = 0.54, SD= 0.02) and the statistical models (M = 0.53, SD = 0.08) did significantly better than the media (M = 0.47, SD = 0.05). The predictions of the fans (M = 0.50, SD = 0.04) settled right between the media and the mathematical models.
Are there possibly underlying causes for the differences in correlations between the four groups? To answer this question, we have to take a closer look at the difference between points predicted and achieved for each team which is known as the absolute deviation.
Absolute deviation of teams
When examining these differences, we check both the over- and underestimation of the predicted and actual points per team. The next graph shows the absolute deviation between the number of points forecast per team and the actual points per team. Current champions Leicester City deviated (underestimation is shown as orange in the graph below with overestimation in red) strongly from the forecasts. In absolute sense, the predictors were wrong about ‘The Foxes’ by an average of around 42 points.
Moreover, the general overestimation of Chelsea, and the eventually relegated Aston Villa, decreased the correlation to such a moderate level. Sunderland and West Bromwich Albion had the smallest differences between predictions and what happened and were therefore predicted most accurately. On average the respondents expected WBA to get around 3 points – or just one win – fewer than they achieved. However, this figure displays the performance of the entire group of 91 predictors. What happens when we split these deviations into groups of fans, media, models and bookies?
The average differences per group are shown below. In predicting season long performances the statistical models and bookies did a relatively better job on Leicester City, Arsenal and Liverpool. However, large differences for Tottenham Hotspur were present in the model predictions. Overall the bookmakers are in pole position as we might expect. They only deviated the most in the case of Manchester City. In comparison to the rest, the journalists scored the worst. For nine out of the 20 teams the journalists’ forecast was the most inaccurate. Their assessment of Crystal Palace was the biggest difference from the other groups. The average points differences of each group of predictors for each team are shown below:
Technical note: The p-values are from an Analysis of Variance (ANOVA) testing whether there is a significant difference between the groups or not. If the p-value is low as in the first few teams, significant differences are present.
In the 2015/2016 Premier League season, everyone’s prediction was way off. First of all, Leicester City suddenly leapt from nowhere to win the league. No team had finished outside the top-3 in the Premier League and then won the title a season later. A sports anomaly! Was there a sign? Apparently not given the fact of an average deviation of 42.18 points from Leicester’s predicted points total pre-season. However, one must note that in the 2014/2015 season, Leicester’s end of season run was spectacular with seven wins and a draw from nine games. Models and bookies outperformed the fans and media on `the foxes`. Could there have been some mathematical explanations in Leicester’s success factors?
Were there other Premier League surprises? Yes, Aston Villa and Chelsea according to the data, . Aston Villa avoided relegation in 2014/2015, only to go down a season later with a spectacularly low points total. Some entrants predicted their relegation but not such a low points total. As for Chelsea, one of the richest clubs in England, they managed just 50 points and finished mid-table after winning the league a year earlier. Models using financial inputs were very bad predictors last season as the established order was overturned.
Looking further into the results of the four groups of predictors, we noticed several differences. The bookmakers were the best predictors, but only performing marginally better than the statistical models. Those statistical models did relatively well – or perhaps ‘less badly’ – due to their conservatism at the top and bottom of the ranking. It is a function of their methodology that models assign more points to potential relegation candidates and fewer points to possible champions. The models then benefit more in situations where relegation teams overperform and possible title contenders underperform. In contrast to the models the media experts tend to overrate established teams and underestimate newer arrivals to the Premiership (as explained by David Sumpter in a previous blog). As a result of the topsy-turvy 2015/2016 season the media deviated the most from the actual standings.
A ratatouille of insights may provide plenty of explanations for the curious 2015/2016 season but that is probably just being wise after the fact..At Scoreboard Journalism, we are already excited about the new season and a new round of predictions. We want to wish all participants a lot of wisdom and luck in completing their forecasts and let’s do some regression to the mean this season!