The maths that shows that football can’t be predicted.
By David Sumpter, Professor of Applied Mathematics and author of Soccermatics
With assistance from Simon Gleave, author of the Scoreboard Journalism blog
The Premier League has been turned upside down this season. Leicester City, relegation battlers from last year are have pulled off the impossible and won the league. Chelsea, unassailable last season, have fallen, winning only six of their first 21 matches of this season. West Ham United have beaten all six of England’s richest clubs; Arsenal, Chelsea, Manchester City, Manchester United, Liverpool (twice) and Tottenham Hotspur. Manchester United seem to have lost their way, with Spurs taking their place as the team with home-grown youngsters playing exciting football.
Were any of these changes in the league predictable in advance? It is easy to be wise after the fact—to discuss the merits of Leicester’s 4-4-2 revival, the dark-side of Jose Mourinho’s temperament and to claim that Arsene Wenger’s and Louis Van Gaal’s tactics are dated —but did any of the experts see these changes coming?
Before the start of the season, Simon collected data that allows us to find out. In August, he received league table predictions of 17 journalists, including Michael Cox at the Guardian, Robbie Savage at the Mirror, Jonathan Liew at the Telegraph, Phil McNulty at the BBC, Joe Prince-Wright at NBC and Mark Langdon at the Racing Post. These media experts should be given credit. They were willing to put their reputations on the line and list 1 to 20 how they thought the table would look at the end of the year. Not all pundits are willing to take such a risk.
Experts make the same mistakes
So how did the experts do? Lets start by looking at their predictions this year for the biggest surprises: Leicester and Chelsea. Below are histograms of the predictions compared to how they were placed last season (red lines).
10 out of 17 of the journalists thought Chelsea would win the Premier League, with all but one of them placing them in the top three. Only Talksport’s Mark Saggers predicted a significant decline, by placing Chelsea 5th. Every one of the 17 pundits thought that Leicester City would do worse this season than last. They attributed Leicester’s late rise to 14th at the end of last season to a lucky streak, and predicted they would return to the bottom of the table. The exchange of fortunes of Chelsea and Leicester was totally unanticipated in the media.
Comparing histograms of the expert’s predictions for each of the other 18 teams in the Premiership there is a clear pattern. For 15 of the teams, the most common position prediction is within one place of last season’s place. For example, below are histograms for Liverpool, who were typically predicted to move up one place from 6th to 5th and West Ham, who were predicted to remain around 12th.
These histograms reveal what media-experts tend to do when asked for predictions. They make small adjustments to the previous year in order to predict the coming year. They clearly understand the importance of using last season’s result as a benchmark in predicting the next season, and then adjust slightly based on experience.
The season isn’t over, but we can compare the current league table with the journalist’s predictions. We do this below (each journalist is a dot) together with a last season’s outcome (solid line).
Media experts can’t reliably predict the outcome of the Premier League. In fact, none of them beat the benchmark of simply writing down last year’s table. The adjustments the experts make in order to ‘fine-tune’ for the coming season, make their predictions worse rather than better. Liverpool, Everton, Newcastle and Villa have, contrary to what most experts thought, continued to underperform this season. West Ham, Tottenham and, of course, Leicester, have improved more than predicted. The experts tend to be conservative in their predictions, over-rating established teams and underestimating newer arrivals to the Premiership.
Ask the fans
It is not just media experts who get it wrong. Simon also collected in predictions from 28 fans on Twitter. The fans made very similar predictions and errors as the journalists. They also predicted that Everton, Newcastle and Villa would improve and Leicester and Sunderland would do worse than last season. 75% of fans thought Chelsea would win a second consecutive title. Only 3 out of the 28 made predictions that are better than last season’s benchmark.
We can give only one fan, Sam Jackson, that outperformed last season’s benchmark some credit for getting the league right. However, from the point of view of our study we have to conclude that he simply got lucky. If 28 fans make guesses, then by chance a few of them will land on the right side of last season’s benchmark. What is remarkable is how few of them beat the benchmark. We are forced to conclude that fans know just as much (or as little) about what will happen during the coming season as the experts.
If experts and fans aren’t particularly good at beating last season’s benchmark, then maybe statistics has the answer? Researchers, analysts and bloggers who mathematically model football have proposed a wide-range of approaches to predicting success. Some models are based on previous results. Others take in to account shots on and off target. One popular approach is ‘expected goals’, where each shot or header is assigned a value corresponding to how good the chance is.
Simon collected predictions from 43 different statistical analysts for the 2015/2016 Premier League season. Despite the widely differing methods, overall the statistical models favoured Manchester City, placing them narrowly ahead of Chelsea. 16 of the models picked City to win the league, 14 predicted Chelsea and 9 others had Arsenal as winners. Manchester United were predicted to come around 3rd or 4th. Liverpool were typically 5th and Spurs 6th. Not too different than the season before.
None of the models ranked Leicester as potential champions. However, the models did give Leicester a better chance than the football experts. The most common model prediction for Leicester was for them to come 13th, compared to the 18th or 19th position that most experts placed them. Models are not subject to the herd mentality and biases that appear to cloud the judgment of experts. Statistical models are based primarily on past performance, of either teams or players, so they accurately reflect the current trends.
While models did beat experts, they still have a long way to go before we can rely on them for predicting the future. Only five of the models beat last season’s benchmark, and not by particularly large margins. The performance of models was hardly a resounding demonstration of the power of stats in league prediction.
Simon’s prediction study won’t stop experts and fans from continuing to speculate, and nor should it. But next time you hear an expert proclaim on long-term predictions in football or other sports, it is important to put these predictions in the correct context. Many predictions are just a small adjustment on what we all know from common sense: that teams tend to perform similarly from one year to the next. And experts tend to fall in to the trap of ‘correcting’ the past, to fit their own (usually incorrect) preconceptions about how the world is.
Statistical models avoid the biases that humans are subject to, but it is arguable whether they beat the simple benchmark of looking at what happened the year before. Simon has collected data over three seasons now, and no single model has proved itself capable of reliably beating the previous year benchmark. In Soccermatics, I analyse both the experts and the models and finds no consistent over performer.
Where models and experts fail is in predicting the really big, important events, like Jose Mourinho’s downfall or Leicester’s surprising run. Predicting game-changers seems to be impossible, in football, in other sports, in politics and in life. There will always be events that no-one is able to predict.