Last season, I collected 13 forecasts before the 2013/2014 Eredivisie season, the results of which can be found here. Throughout last season, Relative Age Effect expert Steve Lawrence, an Englishman living in Amsterdam, sent me all sorts of graphs relating to the progress of the forecasts which I never managed to publish. After meeting Steve on the day the 2014 World Cup began, I vowed to improve on this this season. Therefore here is a guest post from Mr Lawrence looking at last season’s prediction winner @soccermetrics and how this season looks. Who has entered a potentially accurate forecast and who are the outliers? Take it away Steve……
Howard and Aaron at @soccermetrics set the bar very high when accurately predicting the final 2013/14 Eredivisie table prior to the beginning of that season and it will be interesting to see if the 48 forecasters for the 2014/15 season can achieve the same sort of accuracy.
A simple scatterplot shows just how accurate the forecast from Soccermetrics actually was.
Apart from overestimates for Utrecht by 15 points and PSV by 10 points and an underestimate for Twente by 10 points there is a very good fit between predicted points and actual points.
The two components which give the prediction such accuracy are first the predicted order of the teams and secondly the predicted shape of the table (for example the range of points top to bottom, total points, mean points achieved by each team and the standard deviation (SD) from that mean).
The final order of the teams for 2014/15 will not be known until the end of the season but there are some characteristics of the shape of 2014/15 table which we can have a degree of certainty about in advance by looking at some straightforward descriptive statistics.
In making their predictions each of the forecasters has chosen a mean and a standard deviation for their predicted table. In the case of the 12 prominent media, who only predicted the final order and not the points, these characteristics are imposed as the average for the last 9 seasons.
So a first analysis of the 48 predictions comparing the mean and the SD looks like this:
It’s clear that the predictions are clustered around a mean points accumulation of 47 and a SD of about 16.
A few significant outliers are already evident.
It’s worth looking closely at the data for the last 9 seasons (green data points) because these are actual outcomes.
It could be argued that there is a discernable correlation, with a lower mean correlating with a lower SD. This makes sense because a more competitive competition would result in more draws and therefore fewer points accumulated and also less of a range between the highest and lowest points accumulated.
The bottom left green data point is season 2013/14 and there is an opinion that the season was more competitive due to the cream of Eredivisie players moving abroad prior to the season start leaving the Eredivisie with a younger average age and more equally matched teams.
Forecasters who think the same applies to 2014/15 might therefore be expected to tend towards a lower SD and a lower mean for their current predictions.
It’s reasonable to expect that the shape of the table for 2014/15 will be within a couple of SD’s of the last 9 seasons actual outcomes and if we overlay 2SD error bars (for the last 9 seasons only) we get a good picture of where potentially accurate predictions lie and conversely we see the potentially disadvantaged outliers.
At this point it is worth noting the strong grouping of the stats models around a mean of about 46.8 and an SD of 14. This is quite far removed from the mean of the last 9 seasons and tends to indicate that the stats modellers expect a very competitive season.
Another way of capturing this predicted shape of the table is using the coefficient of variation (CV), which is the standard deviation, divided by the mean.
With the CV charted against the SD a clear picture emerges of how the various forecasts line-up. CV’s between 0.28 & 0.5 are predicted with a strong preference for about 0.31.
This analysis can be quite illuminating when last season’s data is overlaid.
The light-blue data points are for 2013/14 round by round – the CV starts out close to 1 and by round 9 it has begun to crystallise at about 0.28, which is where it eventually ends up. The maroon data points show the CV for the first 3 rounds of 2014/15.
None of this encompasses the issue of the predicted order of the table but both order and shape are necessary for an accurate prediction, so getting the shape right is half the battle.
Steve Lawrence is a semi-retired architect who researches age effects and their relationship with performance outcomes in sport. He uses the twitter handle @SteveLawrence_ and is a very occasional blogger http://theanalyticslab.tumblr.com
He is the principal author of the Wikipedia page on ‘Relative Age Effects’: http://en.wikipedia.org/wiki/Relative_age_effect and has developed the miTeamsheet app: http://www.miteamsheet.com (presently alpha testing version 3.5) which allows both a rapid calculation of the ‘average team age’ of any competing sports team along with an index of ‘relative age bias’ within the team.
Steve maintains an interest in International Sports Law, EU State Aid Law in respect of Sports Facilities, Corruption in Sport and Sports Betting.