• Corsi and Context

    by Tyler Dellow • March 26, 2013 • Uncategorized • 19 Comments

    Backhand Shelf’s Justin Bourne made a tongue in cheek comment last night that kind of caught my eye:

    All players with negative Corsis are bad at hockey & ones with positive ones are good and ones in the middle are okay.

    He’s kidding but I think there’s a kernel of truth in what he says, in terms of how Corsi data gets abused. First, there are a lot of people who talk about hockey from a data point of view who treat it that way. Second, there are a lot of people who are opposed to the use of data in evaluating hockey players and teams who pretend that that’s what the data people think when, in reality, I don’t think that’s how people who are interested in this stuff really view it.

    The starting point for me, when I’m looking at Corsi data, is always remembering that there are different expectations depending on where you are in the lineup. No hockey team is made up entirely of Datsyuks, puck possession wizards who crush the opposition in terms of shots when they’re on the ice. Generally speaking, as you go further down the lineup, the players get weaker in terms of their ability to gain and keep possession of the puck.

    I wanted to illustrate this because I think that the ranges of performance are something that doesn’t get talked about enough. People say “Oh, he’s in the black, he’s good” or “Well, he’s in the red, he stinks” and it misses the nuance that the “that’s ok” line is at different spots depending on where you are in the lineup. There are absolutely guys in the NHL who would have terrible possession numbers if thrust into a top line role who are valuable players lower down the lineup.

    I wanted to try and illustrate this so I took the 2011-12 data and sorted all 595 forwards who appeared in at least one game by their average TOI. I then sorted the players into four buckets. I wanted to reflect NHL reality so I tried to create four buckets of roughly equal size in terms of the number of games played by players in it. The BTN data shows players listed as forwards as having played a combined 29325 games last year. I simply went down my list sorted by TOI/G and when I got as close to 7331 as possible (a quarter of 29325), I started adding players to my next bucket.

    Now, if NHL coaches try to put their better players on the ice more than their lesser players, I should be able to see some indication of this in the data. Let’s see what how the players in my four buckets did in a variety of categories.

    Now, it’s entirely possible that I’m easily impressed but I always like it when theory and data match up. It’s one of the things that I find so amusing about the resistance that a lot of hockey people have to data – honestly, a lot of the time, the data says things that hockey people would believe anyway, with the added benefit of providing some scale as to how significant some phenomenon or tactic is. Who could oppose this? *Thinks back to the taunting of nerd Elliotte Friedman on Hockey Night In Canada by the cool professional hockey players, nods*

    A note on the PDO: the BTN data doesn’t have empty net goals backed out of it. The guys in my first bucket are going to take the bullet on the vast majority of ENG against, which is going to make their save percentage worse than their actual save percentage ability. I don’t doubt that save percentage ticks up a little bit as you go down the lineup – your first liners aren’t often playing against the opposition’s ham handed fourth liners who blow more of the chances that they create – but I doubt that it’s that severe.

    Here’s the thing though: note the Corsi% difference between the average guy in bucket 1 and guy in bucket 4: 52.1% to 47.1%. If you’re using Corsi data and not taking the role that a player plays into consideration, you’re missing out on a huge chunk of information. Another point, in relation to coaching: if you add up the SF/60 and SA/60 lines for the first and fourth liners, you see a pretty huge gap. First liners see 60.8 S/60 when they’re on the ice; fourth liners see just 53.9 S/60. Coaching and defensive play kills entertainment.

    I wanted to take a further step into this, so I sorted each of my four buckets of players by their Corsi percentage when they were on the ice and then went through a similar process to the one by which I created my initial four buckets and created five sub-buckets in each group. So what we’re now dealing with are four buckets created on the basis of ice time and then five buckets within those four buckets created on the basis of Corsi. I think it’s a pretty interesting table:

    (If you’re using a Mac, use CTRL+left click to pull up a menu that lets you open it in a different tab for easier reading.)

    A word about columns on the far left: I wanted to create a sense of the scope of the differences between good and bad X liners, so I took their GD/60 and multiplied it by the average ice time for that line and then by 82 games to convert things into goal difference over the course of a season. Then, for ease of reference, I set the worst sub-group by Corsi as being worth zero goals and compared the others to that.

    So, for example, first liners play an average of 14.96 5v5 minutes per night. 14.96*82 equals 1227.1 “first line” minutes over the course of an NHL season. The worst of the five groups of first liners that I identified had a GD/60 of -0.13. -0.13 multiplied by 1227.1/60 equals -2.56. Over the course of a season, we’d expect a team with a first line drawn from that group (subject to some caution below), to have a goal difference of -2.5 when those guys are on the ice. That doesn’t sound terrible – until you realize that 40% of the “first liners” in the league are providing a goal difference of 10.5 or better with that same ice time and you realize that your first line leaves you 13 goal difference back.

    This is a seriously fascinating table if you’re into this stuff. I find it pretty amazing that the shooting percentage is so tightly matched in each of the four groups. By that, I mean that I’ve sorted this by ice time and, somehow, this results in buckets whereby the guys who are most likely to finish are getting the most TOI and that as TOI falls, so too does finishing. The second thing that I find interesting flows from that – once you’ve sorted by ice time, Corsi looks like a pretty solid indicator of how good a first, second, third or fourth line has performed, relative to other first, second, third or fourth lines. It’s not like there are groups with poor Corsi relative to the class that they’re in who are able to make it up with their finishing skills.

    The names of the players in the high Corsi/high TOI box are pretty fun: twenty guys, including Alex Steen, Pavel Datsyuk, Evgeni Malkin, James Neal, Patrick Sharp, Jonathan Toews, Anze Kopitar, Johan Franzen, Gabriel Landeskog, Joe Thornton, Logan Couture, Jordan Staal, Ryan O’Reilly, Daniel Sedin, Henrik Zetterberg, David Backes, Patrick Marleau and Dustin Brown. (I note that people who are paid to consult on hockey might speculate that Jordan Staal was carried by Tyler Kennedy). It is basically a glittering list of star NHL forwards with whom you can win Stanley Cups if they’re on your top line. Except, I suppose, for Joe Thornton.

    One of the problems that a lot of people kind of intuitively have with Corsi is that you go to BTN, you pull up the list of forwards sorted by Corsi and BAM: “These idiots think that Brad Marchand was one of the best forwards in the NHL in 2011-12? Watch a game, nerds.” Taking ice time into account seems to produce a list that jives much more with what you’d expect, while still leaving room for names that might not be on your radar.

    There’s a point that has to be made here. When we’re talking about a player’s Corsi, what we’re really saying “This player had a shot attempt share of X% when he was on the ice in his particular mix of circumstances, with his particular mix of players.” It bears repeating a lot, because it gets abused at both ends of the spectrum. In the case of the Oilers mess at the bottom of their roster, I tend to think that there are some guys there who can play who are being buried in a sea of terrible hockey players. The defencemen who are on the ice will have an impact. The flip side’s true as well – take a look at Jordan Eberle’s Corsi% with and without Taylor Hall for an example of this. The job for hockey teams, and for commentary sources, is to suss out which guys are really pushing the bus and which guys are passengers.

    A word about fourth lines – I mentioned the old Vic Ferrari dictum about people getting way too excited about fourth lines when it’s their top lines that kill the team. This data is supportive of that, I think. Look at the spread between good and bad first/second/third lines and then look at the spread between good and bad fourth lines. I suspect that what I call the “win value” of the goals that the fourth line tends to be involved with are lower too. Fourth lines tend not to play in the last ten minutes when the game’s on the line. I would think that a higher percentage of the goals in which they are on the ice are irrelevant to the outcome of a hockey game than with guys who play higher up.

    Of course, that doesn’t mean that you should not care about your fourth line or put a terrible one together, unless it’s a deliberate decision to permit you to focus more resources elsewhere in your lineup. You’ll note that the bottom group of fourth liners averages a Corsi% of 41.5 – Lennart Petrell, Ryan Jones, Mike Brown, Chris Vandevelde, Eric Belanger and Ben Eager are all below 40%. I’m sure that the defence that the Oilers have iced isn’t helpful but five of those guys were back from last year and they had terrible numbers by and large then too although, to be fair, not quite this bad. (Just as an aside: I don’t think the Oilers are a true talent 44.8% Fenwick-close team; I think they’re better than that or, at least, that management could reasonably have expected the top end to be better.)

    In any event, keeping this sort of framework in mind when using Corsi data is probably something that could be helpful to people. Corsi’s not the only thing in the world and sometimes players with negative Corsi are good in their roles while guys with positive Corsi aren’t really.

    Email Tyler Dellow at mc79hockey@gmail.com

    About Tyler Dellow

    19 Responses to Corsi and Context

    1. Brad
      March 26, 2013 at

      In terms of taking this and using it to make roster/lineup decisions (to the extent that it would even be possible), is it fair to say that the extreme cases in each groups would be the guys most likely being misused? Line 2 group 1, is it fair to say those are guys that should be able to hang with 1st line minutes? Or is their sh% meaningful enough to say that these are true-talent 2nd liners who are just being utilized correctly and perfectly fill their roles?

      I realize that all of that would depend on who you’re replacing on other lines. Mostly I’m just curious if there’s any reasonable statistical indicators to look out for that player A is getting too much icetime and player B getting too little, while accounting for the possibility that player B would be overwhelmed in a larger role.

      • Tyler Dellow
        March 26, 2013 at

        “Misused” is a bit tough to say. Here are the twenty guys in the line 2, group 1 bucket:

        Bergeron
        Kennedy
        Crosby
        Langenbrunner
        Marchand
        WIngels
        Williams
        Seguin
        Stalberg
        Dwight King
        Bertuzzi
        Horton
        Wellwood
        Dupuis
        Gagne
        Burrows
        Sobotka
        Hornqvist
        Goc
        Downie
        Foligno
        Penner
        Booth

        Crosby’s a bit of a goofy name in there – it’s because of his TOI last year. He had a bunch of pretty short games in terms of TOI as they eased him back in and then rested him before the playoffs. I would say he can probably handle a first line role. ;)

        The rest of those guys, a lot of them are playing behind better players or are on teams that split the ice time between good players really evenly, like Boston. Obviously, if you look at the GD, you can survive without higher level finishing if you’re that dominant possession wise; the hill probably gets steeper as you move up.

        As far as indicators go, if there are big Corsi difference in favour of guys getting second line TOI, I might wonder if the coach had the order of things wrong.

    2. March 26, 2013 at

      Your analysis is similar to Tom Awad’s — http://www.puckprospectus.com/article.php?articleid=625 — so I’m finding it interesting to compare the results.

      In his analysis, the top lines shot much better than third lines (9.1% vs 7.7%), just like you’ve seen (9.2% vs 7.6%).

      But in his analysis, the top three lines all had the same shooting percentage against (8.1, 8.1, 8.0%), so the shooting percentage differences drove meaningful PDO differences and the conclusion that finishing talent was as important as outshooting. Whereas in your analysis, the opponent’s shooting percentage goes down as you move down the lineup (8.9%, 8.3%, 7.9%), resulting in PDOs that hover much closer to neutral and have much less impact on the results.

      So what gives? Did teams make a really large shift towards power-versus-power between ’09-10 and ’11-12? Is the difference in how you binned the players (on GP vs TOI/G) really important? Is there some other difference I’m missing?

      Curious to hear your thoughts.

      • Tyler Dellow
        March 26, 2013 at

        I think it’s a couple of things. I actually re-ran Tom’s analysis last summer and emailed him about it. He got curious and ran the data for last year and found this:

        Act SF% Act SA%
        8.7% 8.3%
        8.6% 7.7%
        7.3% 7.8%
        6.4% 7.5%

        That’s applying his methodology, which is slightly different from mine. The guess we had both had was a shift to PvP; perhaps I can refine it now and guess that it’s a shift towards running possession players on the top line.

    3. March 26, 2013 at

      Great stuff. But,

      First liners see 60.8 S/60 when they’re on the ice; fourth liners see just 53.9 S/60. Coaching and defensive play kills entertainment.

      Doesn’t this suggest that crappy players kill entertainment. Presumably the first liners are getting the same amount of coaching and similar instructions re defensive play. What is the percentage composition of Europeans in the first line group versus the fourth line group? I’d add that crappy north american grinders kill entertainment.

      The names of the players in the high Corsi/high TOI box are pretty fun: twenty guys, including Alex Steen, Pavel Datsyuk, Evgeni Malkin, James Neal, Patrick Sharp, Jonathan Toews, Anze Kopitar, Johan Franzen, Gabriel Landeskog, Joe Thornton, Logan Couture, Jordan Staal, Ryan O’Reilly, Daniel Sedin, Henrik Zetterberg, David Backes, Patrick Marleau and Dustin Brown.

      Who are the other two?

      • Tyler Dellow
        March 26, 2013 at

        Presumably the first liners are getting the same amount of coaching and similar instructions re defensive play. What is the percentage composition of Europeans in the first line group versus the fourth line group? I’d add that crappy north american grinders kill entertainment.

        A few years ago, I took Amanda to see the Oilers play the Leafs. We sat up high in the cheap seats behind one of the nets. I was explaining forechecking to her and it was amazing to watch how the different lines had different tactics. Far, far more 1-2-2 type forechecking from the bottom lines. They’re a lot more cautious than the top lines. I assume that coaches would tell their European players to play the same way if they were playing on the bottom lines.

        Who are the other two?

        Chris Kunitz and Jeremy Welsh, who played one game.

    4. Stephan Cooper
      March 26, 2013 at

      I think its interesting to note the highest icetime group is the bottom quintile of the top line group.

      I think it may be a symptom of what creates a negative possession top line player which is typically a good player on a bad teams with bad supporting casts that’s forced to do to much. The Mikko Koivu and Stephen Weiss’s of the world.

    5. Trenton L.
      March 26, 2013 at

      it would be interesting to see the average salary for each bucket. i.e. do guys get paid based on shooting %/ice time or GD/60? further on a $/ GD/60 basis how would you build the best cap team? Say for example you’ve built the best team possible with $9mm left in cap space. Are you better off signing a Malkin or spreading it around through your lineup with all 3 of Penner, Hornqvist, Goc.

      Seems like bringing up the average player level in 3 spots should beat 1?

    6. jc
      March 27, 2013 at

      Re. Context and Corsi. This is not different then using +- with context, in that sometimes minus player are good and plus players are not. The push back on Corsi is that at times it is the be all and end all but in reality is is a better version of +-. And even then it has limitations.

    7. Woodguy
      March 27, 2013 at

      Really cool stuff.

      This would be interesting to have available for comparison shopping of players.

      One things jumps out at them though:

      595 forwards who appeared in at least one game by their average TOI.

      Wouldn’t that introduce a whole pile of noise into your sample? (like Jeremy Walsh or if you look at this year Arcobello who was put with 4 & 14 for one game and given 15min 5v5 TOI)

      I don’t see the reason to include those players.

      An arbitrary cut off of 20 or so games makes more sense to include only actual NHLers and get rid of small sample issues like Acrobello’s 35% CF.

      I understand that their small TOI will wash a lot of it away, but why include them in the first place?

    8. dan
      March 27, 2013 at

      Excellent post!
      Two addition. I have been thinking that some of old ideas need to be reworked. Take for example top 6 vs bottom 6 . I found that most teams seems to have a Top 4 a Middle 4 & a Bottom 4 – wonder if the results would be much different as many coaches seem to keep players in pairs anyways…

      I also noticed in your table that the peak level of line 1 and line 2 is both 56% Corsi?
      Is this based on only 1 yr? I know Gabe suggested 3 yrs I believe & I know the impact of luck takes ~140 league games would running the data over 2+ yrs changes things?
      Thanks Dan

      • Knighttown
        March 27, 2013 at

        I’m just curious, about a week ago I made a post at LT’s suggesting that it had become fashionable to blame the Belangers of the world when in fact 20 might be performing more closely than one would expect when compared to his peers like Antoine Roussel. Not well, but maybe not as far off the gap as say, Ales Hemsky compared to Patrick Sharp for instance.

        I asked if a guy like Parkatti or Woodguy could create a “par” system to be assigned to players based on their position in the lineup based on EVTOI. First liners should be at 55%, second liners at 52% etc.

        Anyway, BAM, it’s here and I thank you for putting in the work. Have you applied this to the Oilers roster to the point where you can say which line has the worst performance compared to their peers yet?

    9. Pingback: Deadline buyers and sellers; Bruins claim Daugavins; did Flames circumvent cap? (Puck Headlines) | YO Status ->YO Status ->

    10. Pingback: How do the Canucks’ forward lines compare to the rest of the NHL?

    11. dan
      March 28, 2013 at

      any chance a similar look at d-man? very interesting to see…..

    12. Pingback: Modern hockey thought and all-encompassing player evaluation metrics - HockeyAnalysis.com

    13. Pingback: Brooks Laich is the Capitals’ Second Line Center, And That’s Pretty Cool

    14. Pingback: Divining the Capitals Lines

    15. Pingback: Bits: Isles victory reactions; intrasquad chirping; a Rosen | All New York Islanders

    Leave a Reply

    Your email address will not be published. Required fields are marked *