• ZoneStarts and Defencemen

    by  • April 17, 2013 • Hockey • 25 Comments

    There was an interesting discussion that kicked off on Twitter about defencemen and ZoneStarts and this is a sort of thing that hockey analytics is going to need to get into eventually. As it so happens, I’ve been digging into this recently, and I have a bunch of data handy that will shed light on a few things.

    There are 1269 defencemen who were on the ice for at least 100 faceoffs in a season between 2007-12. (I’m counting seasons differently, so an individual defenceman can appear as many as five different times.) We’ll start with ZoneStart and Corsi.

    At the risk of pointing out the obvious, you can see that the Corsi% numbers are clustered much more tightly than the ZoneStart numbers. There are 219 NHL D-seasons between 2007-12 with a ZoneStart at or below 44.9. There are only 149 NHL seasons with a Corsi% at or below 44.9%. The flip side of that is that there are 267 NHL D-seasons at or above a ZoneStart of 55.0 and only 100 seasons with a Corsi% above 55%. Corsi% for defencemen is, for all intents and purposes, clustered between 40% and 60%. ZoneStarts aren’t.

    It’s interesting to me that defencemen seem far more likely to put up low Corsi seasons than high Corsi seasons. I suspect that there’s a simple truth about how hockey works that explains this. Imagine that you’re Marc-Andre Bergeron, a ZoneStart All-Star a few times. Despite getting all of those starts in the offensive zone, you’re still going to spend a significant chunk of time in your own end because defencemen only change going one way – towards the offensive zone. Forwards can get away with changing while the puck is heading the wrong way. Even if you start in the offensive zone, unless your team keeps the puck there for an entire shift, you aren’t getting off the ice without the puck coming out and likely coming into your own end.

    It’s different for guys with low ZS. Nick Schultz is the all-time champion here, posting a ZS of 26.5 for the 2008-09 Minnesota Wild. In Schultz’s case, there was no guarantee he’d be on the ice for a slice of offensive zone time that corresponds to the guarantee that guys who have high ZS will be on the ice for a slice of defensive zone ice time. Whenever the puck leaves the defensive zone, Schultz can probably change.

    The ratio of D-seasons with low ZS to D-seasons with low Corsi% is 149/219 or 68%. For high ZS to high Corsi%, it’s 100/267, or 37.5%. The way in which defencemen change in hockey games would provide an explanation for that, I think.

    Local astronomer Bruce McCurdy asks whether we don’t lose something by discarding neutral zone draws from ZoneStart. As readers are presumably aware, the ZoneStart formula is (OZ faceoffs/OZ faceoffs + DZ faceoffs). Bruce quite reasonably points out that “…there’s a difference between,say 30D +50N + 20O & 60+0+40, even as both are 40% OZ%.” The assumptions inherent in ZS is that everyone’s going to get about the same number of neutral zone draws and that there aren’t faceoff effects as a result of them. I did a quick check of my sample of defencemen to see whether that holds.

    Truthfully, I think it holds up pretty well. 1156 of the 1269 D-seasons under examination saw between 35% and 45% of the faceoffs that they were on-ice for take place in the neutral zone. I doubt that it’s a significant factor, except possibly at the absolute extremes, although I would expect it to show up more in the rate at which Corsi events take place when a player is on the ice than in the percentage of them that his team achieves.

    Colby Cosh comments that the effects of ZoneStart on Corsi% seem to be non-linear. I think that there’s something to this. I’ve graphed the average Corsi% for the six groups of players with ZoneStarts between 35-39.9 and 60-64.9. As ZoneStart increases, Corsi% increases. There’s something going on though – for the 35-39.9 group, the average Corsi% is 44.7%. For the 60-64.9 group it’s 53.3%. Both groups are 10-14.9 percentage points from 50% and yet the decrease sustained by the group with tough ZoneStarts is 5.3 percentage points from the mean and the increase achieved by the group with easier ZoneStarts is just 3.3 percentage points north of 50%. I’d guess that this is tied into what I mentioned before, about how and when defencemen change.

    I’ll throw out one other graph here. I went through and created pairs of D-Seasons to examine. Any defenceman who was on the ice for at least 100 faceoffs in consecutive seasons constitutes a pair. So, 2007-08 Zbynek Michalek, who had a 51.1% Corsi% with a 55.3 ZS is paired with 2008-09 Michalek, who had a 43.4% Corsi% with a 35.5 ZS. For each pair, I then took the difference between the Corsi% in the two seasons and the ZS in the two seasons. There’s actually a reasonably strong correlation coefficient between those two numbers – 0.5. Accordingly, as a player’s ZS increases or decreases, his Corsi% tends to move along with it.

    What can we take from this? I’m a believer that ZS matters, although I suspect that it hurts guys with a tough ZS more than it helps guys with an easier one. I doubt that the neutral ZoneStarts skew things all that much, although it’s probably worth checking whether or not guys with higher shares of NZ ZoneStarts tend to do differently than guys with lower shares of NZ ZoneStarts. That’s kind of where I think we are on this.

    Email Tyler Dellow at tyler@mc79hockey.com


    25 Responses to ZoneStarts and Defencemen

    1. Steven Olson
      April 17, 2013 at

      I am not sure how you can suggest the data in “Average Corsi for defenseman with Zonestart between x and y” is non linear. It looks extremely linear to me. And if you take the center point in your zonestart bin (eg 37.5 for 35-39.9) for x, and plot them in a scatter plot, you can see that it is, in fact, extremely linear!

      Is what you meant that the the slope of this linear relationship is not 1? (using my estimates for the y values I got ~0.35 as my slope)

      • fourfourtwo
        April 17, 2013 at

        I think he more means that its not a 1-1 unit relationship ( as you mentioned) and that the median of 50% don’t overlap.
        I also thing the rate of change changes as it gets further from the middle, similar to the convexity in bond prices vs interest rates.

        • Tyler Dellow
          April 18, 2013 at

          Yeah – I may have been using some technical language where I intended to make a more non-technical point.

    2. palefire
      April 17, 2013 at

      I’m slightly confused by the distribution of Corsi among D-men. Unless I misunderstand (in which case, please correct me!), since there are two defensemen on the ice for both teams in almost all situations, the average Corsi among all D-men has to be pretty much 50%. But in fact what we see in your graph is that (unless there’s some crazy distortion because your bins aren’t narrow enough) the graph suggests that the average is below 50%. So I think the question that needs answering is why that is.

      Two possibilities come to mind which would be easy for you to test. One is that the D-men you’ve excluded (the ones who didn’t get 100 faceoffs) have an average above 50% that balances out the D-men you’ve included; perhaps because they’re weaker players they’re getting better zone starts. The second possibility is that the average really does get skewed by situations where there aren’t exactly two defensemen on the ice. There are pretty much never three+ or zero defensemen on the ice; one defenseman happens mostly on a 4 forward/1 defenseman power play configuration. Then you would be racking up N positive Corsi events for the 1 defensemen on the PP, and -N events *each* for the two defensemen killing the penalty, skewing the net D-man Corsi below 50%. This you could test by checking whether the D-man Corsi distribution changes if you restrict to ES Corsi.

      • palefire
        April 17, 2013 at

        (Of course you’re probably using 5-on-5 Corsi here already, in which case there’s only the first possibility, or else some misunderstanding on my part, or some distortion because of the bins in your graph.)

        • Tyler Dellow
          April 18, 2013 at

          I get the average Corsi as being 49.6%. I’m just using 5v5. I assume that this is to do with the distribution of ice time – ie. one 51% guy playing 1200 minutes and two 45% guys playing 300 minutes each gives an average of 47%.

          • Mike
            April 18, 2013 at

            It may also be that the set of all defencemen that don’t make the cutoff (i.e. less than 100 faceoffs) have a corsi of 50.4%

            Super-sheltered callups and what have you.

          • palefire
            April 18, 2013 at

            Ice time — ah yes, of course, that must be it. Anyway, the main reason I posted my comment is that whatever explained the average Corsi being below 50% was probably also going to explain why there are more defensemen with low Corsi than with high Corsi (and indeed I think it does).

            (An additional effect: if low Corsi [weak] defensemen typically play fewer minutes than high Corsi [strong] defensemen, then their Corsi sample sizes will be lower, so you would expect more (and more dramatic) outliers for low Corsi d-men than for high Corsi d-men.)

            • Ant
              May 31, 2013 at

              My Pastor always redmins us to leave the comfort zone, challenge yourself. If I want to change the world, I probably need to change myself first. Traffic in HK must be horrific and yes, I don’t know how to parallel park either. Big Deal. You will love driving before you notice. Happy Fun in your new role.

    3. April 18, 2013 at

      The problem with this type of analysis is context and ultimately that correlation doesn’t imply causation.

      Let’s take your third chart where you take average corsi% by zone start group. That chart seems to make it pretty clear that zone starts greatly impact corsi% but the problem is I suspect that QoT would show a very similar pattern. Bad offensive players don’t start often in the offensive zone (and thus generally start more often in the defensive zone) and if you are a bad offensive player it is difficult to get a good corsi%. Does Max Lapiere have a bad CF20 because he starts so often in the defensive zone or because he is a weak offensive player? So, if you are a defenseman that starts a lot of the time in the defensive zone you are playing with players that are generally speaking weaker offensive players and thus likely weaker overall. So, is that defenseman’s resulting bad corsi% the result of his defensive zone starts or his weaker quality of teammates? I will argue the weaker quality of teammates is the far more significant factor.

      The same problem arises in your final chart. If a defenseman gets a significantly different zone start profile from one season to the next it probably means he also has a somewhat different set of team mates he shares the ice with so again, was the change in corsi% due to the zone start or due to the quality of team mates.

      (QoC is probably a bit of a factor too but I argue that it is a substantially smaller factor than QoT as well)

      • Gaelan
        April 18, 2013 at

        It is quite easy to plot the relationship between zone starts and corsi on a graph. If you do this you see a very distinct linear relationship. This linear relationship holds for subsets of the player population as well.

        Now I haven’t done it for the specific subgroups you mention. However, that theory is testable. Create a pool of Max Lapierre type players who aren’t good offensively and another pool of offensive players, and plot these subgroupings (zone starts vs. corsi). If you are correct you’ll get a different relationship between zone starts and corsi between the two groups. If you are wrong the relationship will remain constant.

        The QoC and QoT comment is a different argument. Isn’t that what multiple regression is for?

        • April 18, 2013 at

          Yeah, the problem is you are taking players who are playing with different players and against different players and with different zone starts and concluding that the linear variation in corsi% is due to zone starts and not those other factors. Multiple linear regression should help isolate the effect but first need reasonable metrics for QoC and QoT.

          Small sub groups will help but you need to create those sub groups using an independent metric. i.e. how do we identify all the Lapierre types using a metric independent from zone starts which is what we are trying to test. That isn’t necessarily easy to do.

          The proper way to do this, I believe anyway, is instead of trying to identify all the Lapierre type players is to compare Lapierre when he starts in the defensvie zone to Lapierre when he starts in the offensive zone to Lapierre when he starts in the neutral zone. When I did something similar to that that I found the changes relatively minor for the majority of players and only somewhat significant for the few extremes.

          • dawgbone
            April 18, 2013 at

            Is that just for one season?

            If so, is that enough data?

          • Gaelan
            April 18, 2013 at

            I agree that the issue is independently identifying criteria to use in your subgroups. Otherwise you’ll run into tautological reasoning. One way to get around it is to run the data many different ways. None of the ways on their own would be independent but if you ended up with similar data no matter how you divided the groups that would be strong evidence that the zone start effect was constant.

            • April 18, 2013 at

              I account for zone starts by eliminating the first 10 seconds after a zone face off as it has been found that the majority of the zone start effect is gone within 10 seconds of the face off (see http://www.arcticicehockey.com/2010/8/20/1633298/offensive-zone-faceoffs-and or http://hockeyanalysis.com/2012/01/23/adjusting-for-zone-starts/). So, take a look at how things change for a few players with extreme zone starts (2 year 2010-12 data).

              H. Sedin 5v5 FF%: 55.2
              H. Sedin 5v5 ZS Adj. FF%: 53.0

              Malhotra 5v5 FF%: 42.5
              Malhotra 5v5 ZS Adj. FF%: 43.9

              P. Kane 5v5 FF%: 54.7%
              P. Kane 5v5 ZS Adj. FF%: 54.9%

              Bolland 5v5 FF%: 49.0
              Bolland 5v5 ZS Adj. FF%: 50.8

              Those aren’t huge differences and certainly less than one might conclude from the analysis in this post. Maybe I am biased but I am more inclined to accept my 10 second analysis than that in this post which makes the assumption that all of the observed correlation is due to zone starts and not other factors.

            • Kang
              May 31, 2013 at

              Exactly! The problem of the peolpe here nowadays is ‘sticking with “comfort zone”‘! Actually, it is not only the problem of the children (including teenagers), but also of nearly everybody. Otherwise, WHY do the peolpe spend (waste? :-P) so much time, money and effort for all sorts of certs, dips and qualifications? And WHY shall employers ONLY want (or only choose) applicants with ‘RELEVANT qualifications’??? THAT is a MAJOR PROBLEM of this society!!!

    4. April 18, 2013 at

      Love this man. Good stuff as usual.

      I wonder if zone start to GF% would look like similar.

    5. Tyler Dellow
      April 18, 2013 at

      That chart seems to make it pretty clear that zone starts greatly impact corsi% but the problem is I suspect that QoT would show a very similar pattern. Bad offensive players don’t start often in the offensive zone (and thus generally start more often in the defensive zone) and if you are a bad offensive player it is difficult to get a good corsi%.

      There’s basically no relationship between the QoT Corsi numbers and ZoneStart.

      So, is that defenseman’s resulting bad corsi% the result of his defensive zone starts or his weaker quality of teammates? I will argue the weaker quality of teammates is the far more significant factor.

      Well, there’s evidence in support of what I’m saying. I’m not sure that there’s evidence in support of what you’re saying.

      • April 18, 2013 at

        Just curious, where did you get your data from? My site, BTN or calculated it yourself.

    6. Tyler Dellow
      April 18, 2013 at

      BTN. Do you have data that says something different?

      • April 18, 2013 at

        The problem with BTN is they include goalie pulled situations which likely has a somewhat significant impact on the numbers and making the correlation between zone starts and corsi appear more significant than it should.

        I am seeing a link between zone starts and QoT which also mitigates the magnitude of the zone start impact you identify above. Will probably get a post up tomorrow.

    7. Pingback: Further investigation into impact of zone starts - HockeyAnalysis.com

    8. Pingback: The Value of a Zone Start | Statistical Sports Consulting

    9. Jon K
      May 6, 2013 at

      Finally, I’ve been waiting for someone to look at the numbers underlying this relationship. Logically it makes sense that corsi would be firmly correlated with zone start.

      I also wonder if there would be benefit to looking at the defencemen who have corsi above the statistical mean for their zone start range?

      Has this relationship been examined for forwards? It seems unlikely that the relationship would be as strong, in part for some of the reasons identified in your posting.

    10. Pingback: What is 'open play' hockey? - HockeyAnalysis.com

    Leave a Reply

    Your email address will not be published. Required fields are marked *