In my experience, there’s nobody more keenly aware of the shortcomings of data driven analysis in sport than people who are good at doing it. (Infuriatingly, people who are bad at doing it tend to be amongst the least aware of the shortcomings of data driven analysis and yet they keep doing it.) I’ve been watching Leaf games lately so that I get some joy out of watching a hockey game now and then. Joe Bowen and Greg Millen are masters of the snide comments about the hit or giveaway data in some building being garbage and then going on to imply that analytics is a waste of time. Nobody sane uses that data but that seems to be overlooked.
Two shortcomings that have always bothered me relate to quality of competition/teammate numbers and WOWY. The issue that I have with the quality of competition/teammate numbers relates to how they’re calculated. Generally speaking, these things are calculated based on all of the players who are on the ice. It treats each player as a 1/5 contributor to the number. That strikes me as likely to be wrong – we know that there are players who seem to make everyone better (Sidney Crosby) and that there are players who seem to make everyone worse (Jack Johnson). That’s not going to be captured by QOC/QOT analysis. Moreover, Eric Tulsky has suggested that the differences in quality of competition are so small they aren’t worth correcting for. That doesn’t really sit right with me.
This unsettled feeling ties into a concern I have with WOWYs. WOWYs, for the uninitiated, are WIth Or Without You analysis – looking at how a team does without a given player on the ice in order to suss out performance differences. I love WOWYs but there’s a problem with them – you aren’t quite making an apples to apples comparison. If you’re looking at how a number one defenceman – let’s say Duncan Keith in 2012-13, for reasons that will become clear – has performed relative to his team, his team probably plays easier minute when he’s not on the ice. Nobody matches their top defensive pair against fourth lines and they’ll frequently gun for a specific line with it.
In 2012-13, Keith had a Corsi% of 52.4% in 832.73 minutes of 5v5 TOI. When he wasn’t on the ice, Chicago had a Corsi% of 55.0%. The obvious conclusion is that Duncan Keith makes the Hawks worse (or, perhaps more subtly, that his skills are hidden by how good the team is). Instinctively, I don’t agree with this either – Keith looks awesome when I watch the Hawks play, and he’s highly regarded by Professional Hockey Men. If quality of competition is of so little importance that it’s hardly worth adjusting for, then either a player’s effect on Corsi% is irrelevant, which I don’t believe at all, or Keith isn’t very good, which I don’t believe either.
About six years ago, Gabe Desjardins cranked out a rough estimate of how much an injury to a top pairing defenceman and a first line forward costs a team:
The overall winning percentage when players were in the line-up was close to .500 for both datasets, which we would expect since playing time is evenly distributed across good and bad teams. And for both forwards and defensemen, there was a clear difference in winning percentage when these players were out of the lineup. However, the difference was much greater for defensemen overall, and in four out of five seasons for which we have ice time data. Teams can expect to lose four extra games if one of their top two blueliners goes down for the season, while losing a first-line forward appears to cost just one and a half losses. The error bars on these estimates are large, but it’s clear that losing a defenseman is a bigger deal than losing a forward.
That’s a pretty big impact for a defenceman; presumably a star defender like Keith would be worth even more. It doesn’t really fit with what I’m seeing with Keith’s Corsi% though. I’m inclined to think that the vast majority of a defenceman’s value at 5v5 is tied up in how he impacts on Corsi% – there’s a lot of evidence that defencemen don’t have a persistent impact on the shooting percentage or save percentage when they’re on the ice, and that’s all that left. If Keith isn’t having a positive impact on Corsi%, then what, exactly, is he doing?
Using the absolutely phenomenal tools that have been developed by Red Line Station, I looked into Keith’s 2012-13 season more deeply. I actually didn’t set out to do this; as God is my witness, this all started with me wanting to write something about Justin Schultz and Jeff Petry a week ago but things kind of got off the rails and here I am.
Given my unease with the QOC/QOT numbers and the inability to turn them into something quantitative, I decided to take a different approach to things. I have a spreadsheet that identifies the players who were on the ice for every second that the Hawks played at 5v5 in 2012-13 and whether any shot attempts occurred taken in that second. I restricted myself to using 5v5 situations with the goalies in and in which the Hawks had two defencemen and three forwards on the ice. This gave me 136,890 seconds with which to work.
I isolated every eight player grouping with which Keith was on the ice in 2012-13: five opposition players and three Hawk forwards. As it turns out, there are 3,354 of these groups. I then calculated how much time he spent with each group and the results that he obtained. So, for example, Keith spent 144 seconds playing with Jonathan Toews, Brandon Saad and Marian Hossa against Jeff Petry, Ladislav Smid, Taylor Hall, Jordan Eberle and Sam Gagner. The Hawks generated 1 shot attempt in that time and gave up 4.
To build a comparator, I looked at what happened when the same five opponents and three Hawk forwards were on the ice without Keith. In the case of these eight players, they spent 1 second on the ice together without Keith and the Hawks generated one shot attempt. Repeat 3,353 times.
As you can see, I’m kind of freezing out quality of competition/teammates here by comparing the Hawks with Keith against certain opposition and with certain teammates to the Hawks without Keith against certain opposition and with certain teammates. My thinking is that if Keith’s an elite player, it should show up if we can make some sort of a rough apples to apples comparison.
Keith played 49,653 seconds of 5v5 TOI under the circumstances outlined above. Amazingly (to me anyway), 27,234 of those were in situations in which I can’t generate a comparator – there was no time played in which the Hawks had the same three forwards and the opposition the same five guys on the ice. That seems awfully high to me but with teams only facing other teams a couple of times a year and injuries and coaches running the line blender, it seems plausible.
I’ve organized the data based on how much time each eight man group spent with Keith and without him. So, for example, I’ve basically created two buckets for eight man groups that spent at least 60 seconds with Keith and at least 60 seconds without him, at least 45 seconds with Keith and 45 seconds without him, etc. I’ve also done the without Keith data both raw and weighted.
So, for example, the Hawks spent 45 seconds with Patrick Sharp, Dave Bolland and Patrick Kane facing Martin Havlat, Patrick Marleau, Dan Boyle, Logan Couture and Matt Irwin while Keith was on the ice. They spent 186 seconds in that situation without Keith on the ice. They got out-shot attempted 8-1 in those 186 seconds. I’ve re-weighted it by dividing 45/186 and multiplying by 1 SAF and 8 SAA to take into account that Keith didn’t face that situation as much.
If you trust my division skills and don’t need to see the numbers involved, here’s a simplified version of that table which just shows the applicable Corsi%.
Isn’t that fascinating? If you set a threshold of 20 seconds with Keith and 20 seconds without him, basically anything above that, it looks like the Hawks are a much better team with Keith on the ice. This is as we would expect. I’m a bit leery of the data below that because of how line changes happen – the Hawks put up some big numbers when we take into account very small chunks of time without Keith. My suspicion is that that has to do with how defencemen change. They change when the puck is in the offensive zone or heading there. I’d expect those to be favourable circumstances for offence.
I note that we’re dealing with a pretty small sample of time – if you use the 20 second threshold, we’re comparing 175 minutes with Keith to 203 minutes without him. That said, it’s a much more precise comparison, in that we’re comparing him to circumstances without him that are exactly the same. Part of the reason that we want a bigger sample goes away when we can make direct comparisons like this. The closer we get to comparing to apples, the sooner the numbers should start to tell us things.
Intuitively, these results, using a 20 seconds threshold, make sense to me. By reputation and by my eye, Keith’s a great defenceman. If you looked at WOWY, you might not conclude that. Looked at this way, he looks like he makes the Hawks five or six points better when he’s on the ice (keep in mind – some of that credit goes to his defence partner – I haven’t controlled for that). That’s a big change and one that seems to me like it’s more consistent with the size of the impact of a top pairing D that Desjardins has suggested.
I’ll leave this here. Any comments on the methodology and how it could be improved would be welcome.Email Tyler Dellow at firstname.lastname@example.org