Thursday, April 28, 2016

Examining Leicester City: real life or fantasy?

If you're a football fan - that's soccer to my American reader [sic]  - you've heard about what Leicester City is doing this year. It's a lovely story that not even an FFP scandal has been able to put much of a dent into. But how lucky have Leicester City been this season? If you're a lover of narrative you probably are thinking: "well, probably at least a little and maybe a lot," And then shortly after that maybe: "I'd like to punch this man in the face if he's about to ruin the fun." The reality is that almost every team in a competitive league needs luck to win the title. That's just the nature of a game where you play relatively few matches - although playing every team home and away is one of the most fair setups in sport - and it is generally difficult to score goals (at least compared to other games). Trying to quantify that luck may seem like an attempt to undercut what Leicester have done, but I will tell you as a Liverpool fan if you told me LFC could win the title but I'd have to endure a couple of stories about how they got lucky... well, insert your favorite cliche about what someone would do if they got something they've desperately been waiting for going on 25 years.

So in the spirit of trying to understand just a little better some of the luck it takes to win a title for a team that was 5000-1 pre-season odds, I decided to compare each teams' expected points vs. their actual points. To calculate expected points I used a combination of expected GD (using Paul Riley’s xG methodology) and a GD-to-PTS conversion using the last 10 years of EPL data. The comparison of xPTS to PTS will not allow you to say definitively that teams with a positive differential were lucky and those with a negative differential unlucky, but it will suggest that. And the farther the differential is from zero the more likely that is the case. Of course some of the differential is down to good (or bad) tactics, superior conditioning (possibly due to less games played), as well as many other reasons you could come up with. However, this measure should be pretty good to at least suggest which teams were lucky and which teams were unlucky.






According to this Leicester has earned 19 more points than we would have expected. As I said, I do not believe all of that can be attributed to luck, but at least some of it likely can be and perhaps much of it.

Another quick analysis is to look at performance in games decided by one goal or less. These games are theoretically on a knife’s edge between wins/draws or losses/draws (and vice versa). If we go back to the 2000-2001 season, a team’s win% in games decided by more than one goal and their win% in games decided by one goal or less has an r = 0.48 (about 64% of games were decided by a goal or less). This means that if a team has a win% in non-one-goal games that is +1.0 standard deviation from the mean, we expect that their win% in one goal or less games to be +0.5 standard deviations from the mean. LC’s performance in non-one-goal games is +1.5 standard deviations from the mean, so we expect them to be about +0.7 standard deviations better than the average performance of all teams in one goal or less games. Instead, they are +2.1 standard deviations from the mean, 3 times our expectation, ranking 1st in the league for win% in games decided by one goal or less (win% = 54%, good for 9th in my entire sample). The next closest team this season is United, with a 44% win percentage in those games. On the other hand, if you're looking to understand why Spuds will fall short, they have been undefeated in non-one-goal games, but have been just a bit below average in games decided by one goal or less.

(+)As an aside, the best win% in games decided by one goal or less is United in 2008-2009, where they won an astonishing 67% of those games (they won 86% of all other games that season). As an LFC fan, this is yet another reason to hate them and lament finishing 2nd that year

But what if we try to normalize for this extraordinary performance in close games. Using the following formulae, we can attempt to quantify how much of this performance is real ("talent") and how much is luck:

var(observed) = var(talent) + var(random)
var(random) = [p * (1-p)] / n

The result is the standard deviation of talent for one goal or less games is roughly 0.07 and it is 0.22 for non-one-goal games, suggesting that there is three times as much talent in play for games decided by more than a goal. These values also help us to determine how many games of league average we should add to a team's performance to get a better idea of their true talent. It turns out to be roughly 44 games of league average for one goal or less games and only 5 for games decided by more than 1 goal. Doing this and then recalculating a each team's points earned results in the following chart:

Using this method, Leicester City have earned 16 more points than we would have expected them to, quite close to the result we got using a somewhat independent** method. It turns out that the correlation coefficient when you compare how many points a team exceeds their expected points for the xGD method and for this method is 0.85. 

So a different way of looking at luck in performance also suggests that perhaps LC have been benefiting from some good fortune. Does this diminish what they've done? I suppose for some it will. I'm sure it won't make any Tottenham fans feel much better, although what can you really do when luck is against you? Remember that there's always next year.

**These two analyses, though coming at the question in different ways, are connected by the fact that the major reason a team earns more points than their GD would suggest is their performance in close games. So there is overlap, but the first analysis suggests 10 of the 19 additional points can be attributed to Leicester exceeding their actual GD exceeding their xGD by 16 goals. The other 9 points therefore could be attributed to other reasons, primarily performance in close games, but to get even more technical there is an interconnectedness that cannot entirely be disentangled so as to say these 10 where for this exact reason and these 9 were for other reasons. Anyway, though they are not completely independent, they were independent enough that I think they're both interesting to look at

Friday, April 15, 2016

LIV 4 - 3 BVB (Agg 5 - 4), 14 April 2016


Chart corrected for data points prior to the 5th minute: