• Duncan Keith and a True WOWY

    by  • January 13, 2014 • Hockey • 11 Comments

    In my experience, there’s nobody more keenly aware of the shortcomings of data driven analysis in sport than people who are good at doing it. (Infuriatingly, people who are bad at doing it tend to be amongst the least aware of the shortcomings of data driven analysis and yet they keep doing it.) I’ve been watching Leaf games lately so that I get some joy out of watching a hockey game now and then. Joe Bowen and Greg Millen are masters of the snide comments about the hit or giveaway data in some building being garbage and then going on to imply that analytics is a waste of time. Nobody sane uses that data but that seems to be overlooked.

    Two shortcomings that have always bothered me relate to quality of competition/teammate numbers and WOWY. The issue that I have with the quality of competition/teammate numbers relates to how they’re calculated. Generally speaking, these things are calculated based on all of the players who are on the ice. It treats each player as a 1/5 contributor to the number. That strikes me as likely to be wrong – we know that there are players who seem to make everyone better (Sidney Crosby) and that there are players who seem to make everyone worse (Jack Johnson). That’s not going to be captured by QOC/QOT analysis. Moreover, Eric Tulsky has suggested that the differences in quality of competition are so small they aren’t worth correcting for. That doesn’t really sit right with me.

    This unsettled feeling ties into a concern I have with WOWYs. WOWYs, for the uninitiated, are WIth Or Without You analysis – looking at how a team does without a given player on the ice in order to suss out performance differences. I love WOWYs but there’s a problem with them – you aren’t quite making an apples to apples comparison. If you’re looking at how a number one defenceman – let’s say Duncan Keith in 2012-13, for reasons that will become clear – has performed relative to his team, his team probably plays easier minute when he’s not on the ice. Nobody matches their top defensive pair against fourth lines and they’ll frequently gun for a specific line with it.

    In 2012-13, Keith had a Corsi% of 52.4% in 832.73 minutes of 5v5 TOI. When he wasn’t on the ice, Chicago had a Corsi% of 55.0%. The obvious conclusion is that Duncan Keith makes the Hawks worse (or, perhaps more subtly, that his skills are hidden by how good the team is). Instinctively, I don’t agree with this either – Keith looks awesome when I watch the Hawks play, and he’s highly regarded by Professional Hockey Men. If quality of competition is of so little importance that it’s hardly worth adjusting for, then either a player’s effect on Corsi% is irrelevant, which I don’t believe at all, or Keith isn’t very good, which I don’t believe either.

    About six years ago, Gabe Desjardins cranked out a rough estimate of how much an injury to a top pairing defenceman and a first line forward costs a team:

    The overall winning percentage when players were in the line-up was close to .500 for both datasets, which we would expect since playing time is evenly distributed across good and bad teams. And for both forwards and defensemen, there was a clear difference in winning percentage when these players were out of the lineup. However, the difference was much greater for defensemen overall, and in four out of five seasons for which we have ice time data. Teams can expect to lose four extra games if one of their top two blueliners goes down for the season, while losing a first-line forward appears to cost just one and a half losses. The error bars on these estimates are large, but it’s clear that losing a defenseman is a bigger deal than losing a forward.

    That’s a pretty big impact for a defenceman; presumably a star defender like Keith would be worth even more. It doesn’t really fit with what I’m seeing with Keith’s Corsi% though. I’m inclined to think that the vast majority of a defenceman’s value at 5v5 is tied up in how he impacts on Corsi% – there’s a lot of evidence that defencemen don’t have a persistent impact on the shooting percentage or save percentage when they’re on the ice, and that’s all that left. If Keith isn’t having a positive impact on Corsi%, then what, exactly, is he doing?

    Using the absolutely phenomenal tools that have been developed by Red Line Station, I looked into Keith’s 2012-13 season more deeply. I actually didn’t set out to do this; as God is my witness, this all started with me wanting to write something about Justin Schultz and Jeff Petry a week ago but things kind of got off the rails and here I am.

    Given my unease with the QOC/QOT numbers and the inability to turn them into something quantitative, I decided to take a different approach to things. I have a spreadsheet that identifies the players who were on the ice for every second that the Hawks played at 5v5 in 2012-13 and whether any shot attempts occurred taken in that second. I restricted myself to using 5v5 situations with the goalies in and in which the Hawks had two defencemen and three forwards on the ice. This gave me 136,890 seconds with which to work.

    I isolated every eight player grouping with which Keith was on the ice in 2012-13: five opposition players and three Hawk forwards. As it turns out, there are 3,354 of these groups. I then calculated how much time he spent with each group and the results that he obtained. So, for example, Keith spent 144 seconds playing with Jonathan Toews, Brandon Saad and Marian Hossa against Jeff Petry, Ladislav Smid, Taylor Hall, Jordan Eberle and Sam Gagner. The Hawks generated 1 shot attempt in that time and gave up 4.

    To build a comparator, I looked at what happened when the same five opponents and three Hawk forwards were on the ice without Keith. In the case of these eight players, they spent 1 second on the ice together without Keith and the Hawks generated one shot attempt. Repeat 3,353 times.

    As you can see, I’m kind of freezing out quality of competition/teammates here by comparing the Hawks with Keith against certain opposition and with certain teammates to the Hawks without Keith against certain opposition and with certain teammates. My thinking is that if Keith’s an elite player, it should show up if we can make some sort of a rough apples to apples comparison.

    Keith played 49,653 seconds of 5v5 TOI under the circumstances outlined above. Amazingly (to me anyway), 27,234 of those were in situations in which I can’t generate a comparator – there was no time played in which the Hawks had the same three forwards and the opposition the same five guys on the ice. That seems awfully high to me but with teams only facing other teams a couple of times a year and injuries and coaches running the line blender, it seems plausible.

    I’ve organized the data based on how much time each eight man group spent with Keith and without him. So, for example, I’ve basically created two buckets for eight man groups that spent at least 60 seconds with Keith and at least 60 seconds without him, at least 45 seconds with Keith and 45 seconds without him, etc. I’ve also done the without Keith data both raw and weighted.

    So, for example, the Hawks spent 45 seconds with Patrick Sharp, Dave Bolland and Patrick Kane facing Martin Havlat, Patrick Marleau, Dan Boyle, Logan Couture and Matt Irwin while Keith was on the ice. They spent 186 seconds in that situation without Keith on the ice. They got out-shot attempted 8-1 in those 186 seconds. I’ve re-weighted it by dividing 45/186 and multiplying by 1 SAF and 8 SAA to take into account that Keith didn’t face that situation as much.

    Anyway…the data:

    If you trust my division skills and don’t need to see the numbers involved, here’s a simplified version of that table which just shows the applicable Corsi%.

    Isn’t that fascinating? If you set a threshold of 20 seconds with Keith and 20 seconds without him, basically anything above that, it looks like the Hawks are a much better team with Keith on the ice. This is as we would expect. I’m a bit leery of the data below that because of how line changes happen – the Hawks put up some big numbers when we take into account very small chunks of time without Keith. My suspicion is that that has to do with how defencemen change. They change when the puck is in the offensive zone or heading there. I’d expect those to be favourable circumstances for offence.

    I note that we’re dealing with a pretty small sample of time – if you use the 20 second threshold, we’re comparing 175 minutes with Keith to 203 minutes without him. That said, it’s a much more precise comparison, in that we’re comparing him to circumstances without him that are exactly the same. Part of the reason that we want a bigger sample goes away when we can make direct comparisons like this. The closer we get to comparing to apples, the sooner the numbers should start to tell us things.

    Intuitively, these results, using a 20 seconds threshold, make sense to me. By reputation and by my eye, Keith’s a great defenceman. If you looked at WOWY, you might not conclude that. Looked at this way, he looks like he makes the Hawks five or six points better when he’s on the ice (keep in mind – some of that credit goes to his defence partner – I haven’t controlled for that). That’s a big change and one that seems to me like it’s more consistent with the size of the impact of a top pairing D that Desjardins has suggested.

    I’ll leave this here. Any comments on the methodology and how it could be improved would be welcome.

    Email Tyler Dellow at tyler@mc79hockey.com

    About

    11 Responses to Duncan Keith and a True WOWY

    1. January 13, 2014 at

      Now just do that for every player in the league and lets see how this works out ;)

    2. aaron
      January 13, 2014 at

      Awesome post. Also at what point to we give in and start calling WOWYs Bonos?

    3. January 13, 2014 at

      Fascinating breakdown. Question for you- you theorize that the shorter time lengths are better without Keith because of when d men change. But shouldn’t that be true for all dmen? If so shouldn’t all dmen get that boost to their without #s? I’m questioning how that can be possible.

      (I could just be missing something here)

      • Tyler Dellow
        January 13, 2014 at

        A lot of Keith’s small numbers – 5, 10 seconds – will be when a forward changes on one or the other teams. I’d think that leaves him a lot more open to getting tagged going either way. Forwards are far more likely to change with the play going towards the defensive end than defencemen are.

        That’s my theory anyway.

    4. January 13, 2014 at

      Cool stuff. I’m trying to think how this meshes with what I saw. Here’s where I end up:

      Suppose that the average of the five opponents’ Corsi or Corsi Rel isn’t a good assessment of their ability, because it doesn’t take into account how the opponents were used.

      If you’re on a top line and often start in the offensive zone, then the guys you face will tend to be players who start in the defensive zone and take on the opposing top line, which will reduce their Corsi and Corsi Rel and make them look like weaker competition than they really are.

      In that case, the spread of qualcomp is larger than what Corsi QoC or Corsi Rel QoC imply. A more controlled assessment like what you’ve done here — or a more comprehensive one like Vic Ferrari’s approach factoring in QualTeam and zone starts — might be expected to find bigger differences between players.

      But there’s also something weird going on here. You have a total of 359 shots attempts for and 263 shot attempts against when Keith was on the ice for the times that you have off-ice comparisons, a 57.7% Corsi. I have him with 753 SAF and 683 SAA overall — your numbers might be a smidge different depending on data handling, but hopefully they’re pretty close.

      That means he’s 394-420 in the period for which we don’t have an off-ice comparison, a 48.4% Corsi. That seems pretty strange. I’ve tried a few explanations, but haven’t come up with anything I buy. (“Maybe those are minutes where they got caught with a screwy line on the ice and got clobbered…but wouldn’t it also include minutes where the opponent did the same?” “Maybe those are minutes where they’re in the middle of a line change as they enter the zone and get shots…but wouldn’t it also include minutes where the opponent did the same?”)

      Do you have an explanation (other than random chance) for why he’d have done so well in the minutes where we have a comparison and so meh in the minutes where we don’t?

      Something that might help inform that explanation: the shot rates are *really* different. You have 622 shot attempts in 27234 seconds with a comparison — 1.37 per minute. That means there are about 814 shot attempts in the 22419 seconds without a comparison — 2.18 per minute. Something strange is going on; I just don’t know what it is.

      • Pierce Cunneen
        January 13, 2014 at

        Eric,

        I get what your saying about the offensive guy’s Corsi QoC would be pulled downward because they are playing against guys who start more in the D-zone. But wouldn’t just the opposite happen for the defensive guys?

        Those players would have artificially high Corsi QoC because their opponents start a lot in the offensive one.

        Looking at the guys with the highest QoC, it’s all guys with <50% zone start %.

        • January 13, 2014 at

          Yes, the opposite is true too.

          If you’re a strong player who often starts in the offensive zone, you generally face opponents who start in their defensive zone and face the opponent’s best, which lowers their Corsi and makes them look like easier competition than they really are.

          If you’re a weak player who often starts in the defensive zone, you generally face opponents who start in their offensive zone and face the opponent’s worst, which raises their Corsi and makes them look like tougher competition than they really are.

          If you’re a strong player who starts in the defensive zone or a weak player who starts in the offensive zone, then the effects are going in opposite directions and will cancel to some degree.

          But the point is that the less accurately Corsi Rel QoC captures the quality of opponents you face, the less steep of a slope we’ll measure when we look at how your results change as your Corsi Rel QoC changes — and the less important competition (as measured by Corsi Rel QoC) will appear to be.

          • Pierce Cunneen
            January 13, 2014 at

            Ah, last paragraph is the money paragraph there.

            Makes sense.

      • Tyler Dellow
        January 13, 2014 at

        You have a total of 359 shots attempts for and 263 shot attempts against when Keith was on the ice for the times that you have off-ice comparisons, a 57.7% Corsi. I have him with 753 SAF and 683 SAA overall — your numbers might be a smidge different depending on data handling, but hopefully they’re pretty close.

        I have 749/676, so it looks like we’re on the same page here.

        You have 622 shot attempts in 27234 seconds with a comparison — 1.37 per minute. That means there are about 814 shot attempts in the 22419 seconds without a comparison — 2.18 per minute. Something strange is going on; I just don’t know what it is.

        I think you’ve got this backward – maybe I explained it poorly. I get 622 shot attempts in 22419 seconds with a comparison – not 27234 seconds with a comparison. That’s 1.66 shot attempts per 60. That gives me 1.77 shot attempts per 60 in the time for which there’s no comparator. Looks close enough to me that I’d say it’s accurate.

        That means he’s 394-420 in the period for which we don’t have an off-ice comparison, a 48.4% Corsi. That seems pretty strange. I’ve tried a few explanations, but haven’t come up with anything I buy. (“Maybe those are minutes where they got caught with a screwy line on the ice and got clobbered…but wouldn’t it also include minutes where the opponent did the same?” “Maybe those are minutes where they’re in the middle of a line change as they enter the zone and get shots…but wouldn’t it also include minutes where the opponent did the same?”)

        I go back to “When do defencemen change?” for this. Keep in mind, in order for there to be a comparator generated, I need the same five opposition + the same three forwards. That means that I’m limiting myself to changes of defencemen – Keith leaving the ice. Keith isn’t leaving the ice with the puck headed towards Chicago’s end. I would guess that a lot of what I’m capturing is “Keith leaves ice, shot attempt is recorded, puck is frozen, forwards are changed.” That’s why I get the short comparator and that’s why they’re so positive.

        On the flip side of that, the situations with Keith having a short possession and getting smoked, they can be anyone changing, creating that short possession. I assume that’s where it comes from. Make sense?

    5. Donair Poutine
      January 15, 2014 at

      Sidebar: I’ve always had a quibble with how WOWYs are used to compare linemates, ie whether Hall or Eberle is “driving the bus” on their line. The WOWYs always showed that Hall was the straw and Eberle the drink (and I think that’s true), but it wasn’t really apples to apples.

      Hall without Eberle was still playing with Hemsky or Yakupov; Eberle without Hall was playing with any one of Pajaarvi, Smyth and Jones which wasn’t exactly an equivalent group of skaters. This isn’t a perfect example of my point — I don’t think the numbers are wrong though they were likely exaggerated — but it comes up fairly often.

    6. Jesse Dahl
      January 31, 2014 at

      Did you use excel for any of the number crunching or was this all generated from the java project you linked to?

      regardless of choice of tool:
      How long did it take to calculate all the WOWY stuff and how much space did it consume?

    Leave a Reply

    Your email address will not be published. Required fields are marked *