Backhand Shelf’s Justin Bourne made a tongue in cheek comment last night that kind of caught my eye:
All players with negative Corsis are bad at hockey & ones with positive ones are good and ones in the middle are okay.
He’s kidding but I think there’s a kernel of truth in what he says, in terms of how Corsi data gets abused. First, there are a lot of people who talk about hockey from a data point of view who treat it that way. Second, there are a lot of people who are opposed to the use of data in evaluating hockey players and teams who pretend that that’s what the data people think when, in reality, I don’t think that’s how people who are interested in this stuff really view it.
The starting point for me, when I’m looking at Corsi data, is always remembering that there are different expectations depending on where you are in the lineup. No hockey team is made up entirely of Datsyuks, puck possession wizards who crush the opposition in terms of shots when they’re on the ice. Generally speaking, as you go further down the lineup, the players get weaker in terms of their ability to gain and keep possession of the puck.
I wanted to illustrate this because I think that the ranges of performance are something that doesn’t get talked about enough. People say “Oh, he’s in the black, he’s good” or “Well, he’s in the red, he stinks” and it misses the nuance that the “that’s ok” line is at different spots depending on where you are in the lineup. There are absolutely guys in the NHL who would have terrible possession numbers if thrust into a top line role who are valuable players lower down the lineup.
I wanted to try and illustrate this so I took the 2011-12 data and sorted all 595 forwards who appeared in at least one game by their average TOI. I then sorted the players into four buckets. I wanted to reflect NHL reality so I tried to create four buckets of roughly equal size in terms of the number of games played by players in it. The BTN data shows players listed as forwards as having played a combined 29325 games last year. I simply went down my list sorted by TOI/G and when I got as close to 7331 as possible (a quarter of 29325), I started adding players to my next bucket.
Now, if NHL coaches try to put their better players on the ice more than their lesser players, I should be able to see some indication of this in the data. Let’s see what how the players in my four buckets did in a variety of categories.
Now, it’s entirely possible that I’m easily impressed but I always like it when theory and data match up. It’s one of the things that I find so amusing about the resistance that a lot of hockey people have to data – honestly, a lot of the time, the data says things that hockey people would believe anyway, with the added benefit of providing some scale as to how significant some phenomenon or tactic is. Who could oppose this? *Thinks back to the taunting of nerd Elliotte Friedman on Hockey Night In Canada by the cool professional hockey players, nods*
A note on the PDO: the BTN data doesn’t have empty net goals backed out of it. The guys in my first bucket are going to take the bullet on the vast majority of ENG against, which is going to make their save percentage worse than their actual save percentage ability. I don’t doubt that save percentage ticks up a little bit as you go down the lineup – your first liners aren’t often playing against the opposition’s ham handed fourth liners who blow more of the chances that they create – but I doubt that it’s that severe.
Here’s the thing though: note the Corsi% difference between the average guy in bucket 1 and guy in bucket 4: 52.1% to 47.1%. If you’re using Corsi data and not taking the role that a player plays into consideration, you’re missing out on a huge chunk of information. Another point, in relation to coaching: if you add up the SF/60 and SA/60 lines for the first and fourth liners, you see a pretty huge gap. First liners see 60.8 S/60 when they’re on the ice; fourth liners see just 53.9 S/60. Coaching and defensive play kills entertainment.
I wanted to take a further step into this, so I sorted each of my four buckets of players by their Corsi percentage when they were on the ice and then went through a similar process to the one by which I created my initial four buckets and created five sub-buckets in each group. So what we’re now dealing with are four buckets created on the basis of ice time and then five buckets within those four buckets created on the basis of Corsi. I think it’s a pretty interesting table:
(If you’re using a Mac, use CTRL+left click to pull up a menu that lets you open it in a different tab for easier reading.)
A word about columns on the far left: I wanted to create a sense of the scope of the differences between good and bad X liners, so I took their GD/60 and multiplied it by the average ice time for that line and then by 82 games to convert things into goal difference over the course of a season. Then, for ease of reference, I set the worst sub-group by Corsi as being worth zero goals and compared the others to that.
So, for example, first liners play an average of 14.96 5v5 minutes per night. 14.96*82 equals 1227.1 “first line” minutes over the course of an NHL season. The worst of the five groups of first liners that I identified had a GD/60 of -0.13. -0.13 multiplied by 1227.1/60 equals -2.56. Over the course of a season, we’d expect a team with a first line drawn from that group (subject to some caution below), to have a goal difference of -2.5 when those guys are on the ice. That doesn’t sound terrible – until you realize that 40% of the “first liners” in the league are providing a goal difference of 10.5 or better with that same ice time and you realize that your first line leaves you 13 goal difference back.
This is a seriously fascinating table if you’re into this stuff. I find it pretty amazing that the shooting percentage is so tightly matched in each of the four groups. By that, I mean that I’ve sorted this by ice time and, somehow, this results in buckets whereby the guys who are most likely to finish are getting the most TOI and that as TOI falls, so too does finishing. The second thing that I find interesting flows from that – once you’ve sorted by ice time, Corsi looks like a pretty solid indicator of how good a first, second, third or fourth line has performed, relative to other first, second, third or fourth lines. It’s not like there are groups with poor Corsi relative to the class that they’re in who are able to make it up with their finishing skills.
The names of the players in the high Corsi/high TOI box are pretty fun: twenty guys, including Alex Steen, Pavel Datsyuk, Evgeni Malkin, James Neal, Patrick Sharp, Jonathan Toews, Anze Kopitar, Johan Franzen, Gabriel Landeskog, Joe Thornton, Logan Couture, Jordan Staal, Ryan O’Reilly, Daniel Sedin, Henrik Zetterberg, David Backes, Patrick Marleau and Dustin Brown. (I note that people who are paid to consult on hockey might speculate that Jordan Staal was carried by Tyler Kennedy). It is basically a glittering list of star NHL forwards with whom you can win Stanley Cups if they’re on your top line. Except, I suppose, for Joe Thornton.
One of the problems that a lot of people kind of intuitively have with Corsi is that you go to BTN, you pull up the list of forwards sorted by Corsi and BAM: “These idiots think that Brad Marchand was one of the best forwards in the NHL in 2011-12? Watch a game, nerds.” Taking ice time into account seems to produce a list that jives much more with what you’d expect, while still leaving room for names that might not be on your radar.
There’s a point that has to be made here. When we’re talking about a player’s Corsi, what we’re really saying “This player had a shot attempt share of X% when he was on the ice in his particular mix of circumstances, with his particular mix of players.” It bears repeating a lot, because it gets abused at both ends of the spectrum. In the case of the Oilers mess at the bottom of their roster, I tend to think that there are some guys there who can play who are being buried in a sea of terrible hockey players. The defencemen who are on the ice will have an impact. The flip side’s true as well – take a look at Jordan Eberle’s Corsi% with and without Taylor Hall for an example of this. The job for hockey teams, and for commentary sources, is to suss out which guys are really pushing the bus and which guys are passengers.
A word about fourth lines – I mentioned the old Vic Ferrari dictum about people getting way too excited about fourth lines when it’s their top lines that kill the team. This data is supportive of that, I think. Look at the spread between good and bad first/second/third lines and then look at the spread between good and bad fourth lines. I suspect that what I call the “win value” of the goals that the fourth line tends to be involved with are lower too. Fourth lines tend not to play in the last ten minutes when the game’s on the line. I would think that a higher percentage of the goals in which they are on the ice are irrelevant to the outcome of a hockey game than with guys who play higher up.
Of course, that doesn’t mean that you should not care about your fourth line or put a terrible one together, unless it’s a deliberate decision to permit you to focus more resources elsewhere in your lineup. You’ll note that the bottom group of fourth liners averages a Corsi% of 41.5 – Lennart Petrell, Ryan Jones, Mike Brown, Chris Vandevelde, Eric Belanger and Ben Eager are all below 40%. I’m sure that the defence that the Oilers have iced isn’t helpful but five of those guys were back from last year and they had terrible numbers by and large then too although, to be fair, not quite this bad. (Just as an aside: I don’t think the Oilers are a true talent 44.8% Fenwick-close team; I think they’re better than that or, at least, that management could reasonably have expected the top end to be better.)
In any event, keeping this sort of framework in mind when using Corsi data is probably something that could be helpful to people. Corsi’s not the only thing in the world and sometimes players with negative Corsi are good in their roles while guys with positive Corsi aren’t really.Email Tyler Dellow at firstname.lastname@example.org