The Scoreboard Journalism challenge for points and place predictions from prominent media, stats modellers, fans and online publishers for both the Eredivisie and the Premier League has attracted considerable interest and @JamesWGrayson has been publishing regular assessments of how the predictions have been doing.
After Round 3 of the Eredivisie I posted some thoughts about the shape of the various predictions on https://scoreboardjournalism.wordpress.com/2014/08/25/predicting-the-eredivisie-a-guest-post-by-steve-lawrence/
It was clear that the predictions were clustered around a mean points accumulation of about 47 and a standard deviation of about 16 and my thought was that this SD was quite far removed from the mean SD of the last 9 seasons and tended to indicate that the forecasters expected a very competitive season.
In the previous post I made the case for plotting the coefficient of variation against the standard variation as a way of visualising how match results were progressing against the forecasts. Using that technique now shows the Eredivisie after 14 rounds to be less competitive than last season. An important characteristic which shows up in this form of analysis is that the CV varies only within a limited range after Round 9.
The shape of the table is developing with a standard deviation much higher at 6.60 than the 4.78 which occurred after 14 matches last season. The coefficient of variation, which is a function of the mean and the SD, is settling down to around 0.35 which means that we can now begin to predict a standard deviation of circa 17 for the end of the season probably leaving many predictions (including mine) out of contention on the basis of the shape of the predicted table.
I would say that the story for the Eredivisie is that the forecasters predicted a closer race than is actually transpiring and it seems to me that the story for the Premier League is similar albeit that the points distribution for the Premier League has a similar shape to last season.
After 14 rounds the standard deviation for the points distribution is 6.94 against 7.34 for the same stage last season and the coefficient of variation has settled at around 0.36 which for me points to an expected SD of around 18.5. Only a handful of the predictions are close to these figures and it looks unlikely that the CV will dip to 0.3 unless QPR, Hull, Burnley and Leicester City all experience some kind of Christmas epiphany and Chelsea sack Mourinho.
Whilst the CV v SD analysis gives us a feel for the emerging shape of the points tables for the Eredivisie and the Premier League it doesn’t analyse the table order. To do that I’ve been experimenting with Pearson’s Correlation Coefficient charting it against the coefficient of variation which gives an interesting graphic representation of what’s going on.
My view is that if you’ve got the CV about right then you’ve got the mean and the SD about right – so a pretty good shape for the points distribution. If, in addition, you have a high value of r then you’ve also got the order about right.
Graph showing the coefficient of variation plotted against Pearson’s correlation coefficient for the points distribution in the Premier League after Round 14
The CV for each forecast is set for the season by the prediction made but the r value changes week by week depending on how closely the predicted order resembles the actual order (most of the media predictions have the same CV because they predicted order only and were allotted the mean CV for the last 9 seasons). As a reference point randomly generated points tables would tend to show a mean r of 0.5 so after 14 rounds all predictions are doing better than random.
The final table order will have an r of 1 (because it is what it is) but the CV will be dependent upon mean points and the SD of those points – the yellow data point is the most recent CV.
So in my analysis the winner will be the data point with closest proximity to the yellow marker which marks the actual outcome and whose r will always be 1 and whose CV gives a pretty good representation of shape.
@Etnar_UK is the frontrunner with a good correlation between predicted order and actual order and also a predicted shape which is close to the actual shape as measured by mean and standard deviation.
@jasonlemiere and @jackpittbrooke have a good predicted order and quite a good shape prediction based on the mean of the last 9 seasons.
@JamesWGrayson has a good predicted order but has anticipated a shape for the points distribution which is looking unlikely at the moment.
See below for the CV v r table which allows all predictions to be identified on the plot. I’ll post again after round 25.
Steve Lawrence researches age effects and their relationship with performance outcomes in sport. He uses the twitter handle @SteveLawrence_ and is a very occasional blogger http://theanalyticslab.tumblr.com
Steve is the developer of the miTeamsheet app http://www.miteamsheet.com which allows both a rapid calculation of the ‘average team age’ (ATA) of any competing sports team along with an index of ‘relative age bias’ (RAEi) within the team.
Steve maintains an interest in International Sports Law, EU State Aid Law in respect of Sports Facilities, Corruption in Sport and Sports Betting.
So for the Premier League this season, after Round 14, the field is stretching out and my analysis looks like this: