# Two Graphs and 480 Words That Will Convince You On Corsi%

## by Tyler Dellow • March 25, 2014 • Hockey • 44 Comments

I’ve been doing some writing for broader audiences lately, which has led to me thinking a lot about how to convey ideas to people who aren’t five years or more into thinking about hockey in terms of numbers. I think that I’ve come up with a pretty succinct way of expressing why Corsi% matters that I figured I’d lay out here as a resource for other people writing this stuff who run into the same objections.

First, a hockey team’s objective is to outscore its opposition. Players want to be on the ice for more goals for than they are goals against. In other words, they want the percentage of the goals that their team scores when they’re on the ice to be above 50%, something known as GF%. **The greater your Corsi%, the greater the chance that that’s true for you**.

The cunning observer will note that if 80.9% of players with a Corsi% above 55% saw their teams score more than 50% of the goals when they’re on the ice, that means that 19.1% of players did not. This leads to my second point. **Except at the extremes, if your GF% greatly exceeds or falls below your Corsi% in one year, you can expect it to match your Corsi% the following year.**

I can illustrate this by taking the 2330 players who played at least 500 minutes in consecutive seasons between 2007-14 and creating groups based on the difference between their GF% and their Corsi%. So, for example, in 2012-13 Erik Gudbranson had a GF% of 23.8%. His Corsi% was 47.9%. That’s a difference of -24.1%. I can create groups of players who had similar seasons in terms of the difference between their GF% and Corsi% and then look at the difference between their GF% and their Corsi% in the following season.

So, for example, we can see that the group of players who had a GF% of at least 10 points worse than their Corsi% in Year One averaged 12.9% worse in Year One and saw that gap fall to 2.6% in Year Two. The guys at the other end of the scale saw the gap fall from 12.9% in Year One (ie. a GF% of 62.9% and a Corsi% of 50%) to just 1.4% in Year Two (ie. a GF% of 51.4% and a Corsi% of 50%). Every other group was within a point in the second year.

Once you accept this concept, it changes a lot in terms of how you think about hockey. It means that when we’re talking about players and teams and how good they are, we should be very, very leery of players/teams whose Corsi% deviates significantly from their GF%. This isn’t to say that some players can’t create an enduring edge here – Sidney Crosby had a 5v5 GF% of 62% from 2007-13 on a Corsi% of 53% – but this is not the norm. For most players, their Corsi% and GF% are going to be very tightly intertwined in the long run. In the short run, GF% is a liar.

Email Tyler Dellow at tyler@mc79hockey.com
this will be my go to for trying to convince people about shot attempt stuff. thanks

Good post and I think it does well at demonstrating the importance of corsi and its predictive power compared to goal differential. I excitedly await David Johnson’s rebuttal.

I wasn’t going to but since you asked, here is my rebuttal:

Go to http://stats.hockeyanalysis.com/ratings.php?disp=1&db=200713&sit=5v5&pos=forwards&minutes=3000&teamid=0&type=goals&sort=ShPct&sortdir=DESC and compare the top 50 forwards on the list and the bottom 50 forwards on the list and tell me again that there isn’t a significant talent difference between the two groups of players.

The 50th best player has an on-ice shooting percentage of 9.12% while the 50th worst forward has an on-ice shooting percentage of 6.98% (interestingly we are comparing Neal to Neil). On an equal number of shots the 50th best forward will be on the ice for 30% more goals.

Now, if you are trying to tell me that there is significant randomness in goal data I accept that. The above tables are the outcome of such randomness. If you are trying to convince me that driving shooting percentage is not an important differentiating talent among players, I don’t accept that. I don’t accept that because every study that successfully attempts to overcome sample size issues with goal data comes up with different conclusions (http://www.hockeyprospectus.com/puck/article.php?articleid=625 for example).

I don’t think anybody has ever argued that everyone will have the exact same on ice shooting % though…

Except that is pretty much what the article above is saying:

“This isn’t to say that some players can’t create an enduring edge here – Sidney Crosby had a 5v5 GF% of 62% from 2007-13 on a Corsi% of 53% – but this is not the norm. For most players, their Corsi% and GF% are going to be very tightly intertwined in the long run.”

By saying that for the majority of players Corsi% and GF% are “very tightly intertwined” in the long run it absolutely implies that there cannot be significant differences in player on-ice shooting percentages (Unless of course you believe that those differences in on-ice shooting percentages are fully offset by opposite differences in on-ice save percentages. This is not the case though because on-ice shooting and save percentages are not near perfectly inversely correlated).

If 50th best Neal’s on-ice shooting percentage results in 30% more goals being scored than 50th worst Neil’s then CF% and GF% are not “very tightly intertwined” over the long haul. They simply can’t both be true and this isn’t just about a few players at the extremes like Sidney Crosby.

He’s saying that the vast majority of players are going to have tightly intertwined corsi and gf%. Which is obviously true based on the huge sample size he used.

He also noted that there are outliers like Crosby and that some people can have higher gf% than cf%.

Not exactly sure why you’re trying to focus on outliers on the ends of the bell curve rather than the hundreds of players toward the middle of the bell curve but ok…

And even the most extreme outliers had a much smaller difference between cf% and gf% in year two. How exactly do you explain that if gf% is the better of the two metrics?

I am not sure the 50th best and 50th worst in a group of 277 are ‘outliers’. That is the 18th and 82nd percentile. 36% of the population lie beyond them. Are 36% of the population outliers?

If those are the ‘outliers’ I am focusing on, I guess you and I have different definitions of outliers.

Yeah except he was looking at 2330 players and you’re looking at significantly less… So you’re once again moving the goal posts to come to the conclusion that you want to.

And you still haven’t explained why in year 2 the difference between cf% and gf% decreased significantly to the point of mirroring each other even at the extreme ends of the spectrum.

“And you still haven’t explained why in year 2 the difference between cf% and gf% decreased significantly to the point of mirroring each other even at the extreme ends of the spectrum.”

Because over small sample sizes GF% fluctuates significantly enough that the difference between GF% and CF% is not overly representative of any particular type of player when only considering one season. Group players like Tom Awad did in the puck prospective article (http://www.hockeyprospectus.com/puck/article.php?articleid=625) and you see how important the percentages are.

“Yeah except he was looking at 2330 players and you’re looking at significantly less… So you’re once again moving the goal posts to come to the conclusion that you want to.”

I am not moving the goal posts, I am increasing the sample size to attempt to remove randomness and isolate true talent. Just because season over season variability is significant does not imply there isn’t a significant talent present that we can’t isolate from that randomness. I am attempting to increase the sample size to attempt to isolate that skill. When you do that you find that shooting percentage is a significant skill.

If I toss a 60/40 heads/tails weighted coin 10 times and it comes up 6 heads and 4 tails and then toss it again 10 more times and it comes up 4 heads and 6 tails I can’t conclude anything about the true weight of the coin but that doesn’t mean the coin isn’t biased towards heads. But if I increased the number of coin flips significantly and instead of looking at 10 vs 10 flips over 6 ‘seasons’ I looked at 60 flips I can much more accurately assess the weight of the coin.

The problem is there have been plenty of studies including this one that show cf% correlates more highly with gf% in the long run than regressed sh%. Like those broadstreet hockey articles where you argued with them and showed little understanding of regression. You cherry picking a few examples on opposite ends of the bell curve doesn’t really disprove anything as far as the importance of corsi. Yes there are players who out perform their corsi (Kessel, Crosby). Tyler mentioned that in this article. They are RARE and they are top 5 players in the world. You’re not gonna convince me that Kessel or Crosby couldn’t improve their respective gf% if their cf% went up as well. Nobody is saying that their shooting talent is unimportant, they’re saying that once you get toward the middle of the bell curve you don’t see as many of those types and as such cf%/gf% closely mirror each other.

You can’t just cherrypick Neal/Neil and say that it disproves the work above. One is a much better possession player than the other. I could select Brendan Morrow and Logan Couture and come to the conclusion that sh% doesn’t matter using that logic.

I am going to continue this at the bottom of the comments.

Would Fenwick have been better here? If you can get more unblocked shots wouldn’t that number be closer to the GF%?

Good work, Tyler, very convincing. I have a similar question as Colton, about not just FF% but also SF%. If you did a similar study for each metric, how similar are the outcomes & is any one of them clearly superior?

I too would like to see evidence that corsi has significantly greater predictive power than SF%.

Alt. Headline: “Dave Nonis Thought He Figured Out Shot Quality. What Happened Next Will Break Your Heart.”

No, the outliers, guys like Crosby, can manipulate the stats based on their skill. To me, this demonstrates that shot quality does exist in the numbers.

You’re right, but Crosby’s effect on possession is phenomenal as well.

Yes but the effect that shot quantity has is much larger than shot quality.

Really good post — brief, but didn’t need to be any longer, it was convincing.

Posts like these, that is ones that are accessible and understandable, are the ones that will start to bridge the gap between the average fan and the stat community and bring this kind of thinking to the mainstream.

When you’re writing for “broader audiences” . . . .

a) You need to label all axes better (more consistently). Your x-axes are falling short in the examples given.

b) Never use terms like Corsi or Fenwick without prefacing them with either of the words “team” or “individual”.

If the intent here was to present a clear cut reason why the non-math guy should pay attention to corsi% and GF%, then I believe you have failed. It needs to be simpler. I think the only people who will make it past the second paragraph are those who already believe what you are saying.

There you have it Tyler, break out the crayons.

I was thinking more along the lines of finger paints. They taste better, and don’t hurt as much when i stick them in my ear.

I’m not sure what you have essentially tried to show here is really what you want to say. Essentially you are saying that CF% is good because it correlates well to GF%. Well, GF% has a direct mathematical relationship to +/- ( +/- = TG * (2GF% – 1) where TG = total goals). Hence, you are demonstrating that player CF% has a good correlation to +/-.

Nevermind. That is exactly what you wanted to say. Corsi% is just more consistent than GF%, and therefore a better indicator over shorter periods – say one season. (However, GF% and +/- would be good indicators for career numbers?)

Just an FYI, you need to be careful about how you use statistics and inferences. If you take any two comparative statistics (i.e. they attempt to measure the same thing) and compute the difference (in this case per player), then takes the extremes of those measures and compare the exact same difference for those players in the following year, you should expect the difference to decrease. This is simply a permutation of the law of averages, one of the most fundamental of all statistics.

The same principle applies after every rookie season where we identify who had the ‘sophomore slump’. Let’s say that, amongst all rookies, there were the 10 best who had roughly the same skill level and over the course of their careers would score the same amount of points. In every season, some players will have a ‘career’ and/or ‘lucky’ year and will outperform their average capabilities. Thus amongst those ten, one would happen to score a few more points and win the Calder. Next year it would be unlikely for that same player to outperform his average and he would do less well. Because of a lack of understanding of the law of averages, we incorrectly identify the one rookie who had a ‘career’ year as better then he is and then extrapolate that he had a ‘sophomore slump’ the following year. Law of averages, over time outliers disappear.

Further, with your argument, you could use this exact same analysis to make the exact opposite conclusion. Why does the fact that when there is a big difference one year and they even out the following year, make one of the statistics (GF) a liar, and the other (CORSI) gospel? Why isn’t CORSI a liar and GF gospel?

The reason, if I am reading this correctly, is that GF% has regressed back towards Corsi%, while Corsi% stays somewhat consistent (i.e. not regressed towards GF%).

Exactly. One stays more consistent due to the large sample size and the other fluctuates much more wildly.

This is an exceptional lecture at the University of Alberta with the same noble aim as you have in this article: http://new.livestream.com/aict/mact/archives

Well worth the time, and dumbed down enough that I can take useful notes with my crayons. Plus, you can pause it. Really really good stuff.

“In other words, they want the percentage of the goals that their team scores when they’re on the ice to be above 50%, something known as GF%. The greater your Corsi%, the greater the chance that that’s true for you.”

Stopped reading right there. English gobbledyg00k to match the pseudo-math gobbledygook.

Stock to Goals, Wins and Stanley Cups.

Art -

You might want to find another site to read.

You might try writing in English.

To really illustrate the value of Corsi%, you really need to add a third chart to demonstrate that GF% is the liar not Corsi%.

Also: In Chart 1, why did you break your players into 2.5% intervals instead of just doing a scatter plot of GF% vs Corsi% with a trend line to indicate strength of relationship? This would also allow us to see the outliers and to see the actual range (i.e., how many individuals are there below 45% and above 55%?). As it is, your chart uses one percentage (% of players) to articulates the relationships between two other percentages, each of which describes the relationship between several variables (goals scores, shots taken, blocked shots, missed shots). Three different % on one chart is not exactly intuitive for the non-Corsi converts you are courting.

“Corsi% is obviously fantastic, dude. It is obvious if you consider that 80.9% of players with a Corsi of 55+% have a GF% greater than 50+%, while only 16.4% of Players with a Corsi below 45% have GF% greater than 50%.”

Yeah . . . that’s going to convince my grandma.

This is in response to Kris above.

“”You’re not gonna convince me that Kessel or Crosby couldn’t improve their respective gf% if their cf% went up as well.”

That is not what I am trying to convince you of. Yes, if they improve CF% they will improve GF%. I am not trying to convince you that CF% is not a useful talent. It is. What I am trying to convince you is that Sh% is a significant talent and not just the “top 5 players in the world” as you put it.

“Nobody is saying that their shooting talent is unimportant.”

That is exactly what you are saying when you write:

“Yes there are players who out perform their corsi (Kessel, Crosby). Tyler mentioned that in this article. They are RARE and they are top 5 players in the world. ”

You can’t have shooting talent being important but not have any significant divergence between CF% and GF%? They are mutually exclusive. You can either say GF% closely mirrors CF% and thus shooting talent is unimportant or you can say shooting talent is important and thus GF% won’t closely mirror CF%.

You are playing both sides of mutually exclusive events when you say “Nobody is saying that their shooting talent is unimportant” and “cf%/gf% closely mirror each other.”

” I could select Brendan Morrow and Logan Couture and come to the conclusion that sh% doesn’t matter using that logic.”

Interesting choice of players because those two players have vastly different on-ice shooting percentages (9.88% vs 6.95% over previous 6 seasons). That is significant.

Also, Morrow’s 6-year GF% is 53.1 and his CF% 49.1 while Couture’s GF% is 58.6 while his CF% is 54.5 both showing a divergence of 4%. Are these two players your examples of how GF% and CF% closely mirror each other over the long term because they aren’t overly good examples?

“You are playing both sides of mutually exclusive events”

I’m saying that you’re looking at 5 or so players and extrapolating it to the entire NHL. Those 5 or so select few players might consistently outperform their corsi. The other 2325 probably aren’t. Hence the results of Tyler’s work.

“Also, Morrow’s 6-year GF% is 53.1 and his CF% 49.1 while Couture’s GF% is 58.6 while his CF% is 54.5 both showing a divergence of 4%. Are these two players your examples of how GF% and CF% closely mirror each other over the long term because they aren’t overly good examples?”

No it isn’t, look at your own site. Morrow’s 6 year gf% is 49.5 despite a ~3% higher shooting % than Couture’s. If his cf% is 49.1 then that’s pretty close.

“The other 2325 probably aren’t.”

There aren’t that may players in the NHL. That is how many Y1vsY2 (smallish sample size) comparisons he can make over the span of 6 seasons. Had he looked at fewer Y123 vs Y456 (larger sample size) comparisons the results would be different.

Also, I am not looking at “5 or so” players. I took at 277 forwards (approximately 9 per team, or top 3 line players) and took the 50th best (Neal) and 50th worst (Neil) on-ice Sh% players and found a significant (30%) difference. That means the 49 ahead of Neal and the 49 below Neil will also have significant (>30%) differences between them. That is far more than “5 or so” you accuse me of looking at.

“No it isn’t, look at your own site. Morrow’s 6 year gf% is 49.5 despite a ~3% higher shooting % than Couture’s. If his cf% is 49.1 then that’s pretty close.”

Morrow’s 5 year CF% is 47.8 so almost 2% points lower than his GF%.

You’re still not getting that the sample size for gf% and sh% data, in order to be predictive, needs to be large. Like not 5 years of player data large, much larger than that. Tyler’s analysis demonstrates pretty clearly that cf % and gf% pretty closely mirror each other as you increase the sample size so I don’t see why further increasing it would lead to the opposite conclusion as you’re trying to imply. The problem is you keep trying to use a stat that has sample size issues (on ice sh%) and is prone to a lot of fluctuation even after 3000 minutes of data and try to imply that it’s more predictive than corsi. You also completely ignore the impact of corsi on James Neal vs Chris Neil etc and chalk it up entirely to shooting talent. Obviously you can tell from watching them one is better than the other in terms of shooting and scoring but even if you were to regress both to league average shooting you’d come up with better predictions for both than if you were to completely ignore corsi and simply use sh%. Which is what the folks at broadstreet told you a year ago.

“You’re still not getting that the sample size for gf% and sh% data, in order to be predictive, needs to be large. Like not 5 years of player data large, much larger than that.”

1. Where is your evidence for that. Who says that 3000 minutes is not enough to minimize randomness enough to be able to determine talent?

2. A list of forwards sorted by 5 or 6 year shooting percentage is highly ordered. Players we think of as elite offensive players at the top, players who are third line defensive players at the bottom. This doesn’t happen due to randomness and the spread among the players is significant.

“Obviously you can tell from watching them one is better than the other in terms of shooting and scoring but even if you were to regress both to league average shooting you’d come up with better predictions for both than if you were to completely ignore corsi and simply use sh%.”

1. Have you calculated what I should regress each of those players based on 6 years of data? If not, don’t make assumptions.

2. There are no guarantees that regressed data is any better for any particular player. For any particular player it could very well be worse. The theory is that over the population more often than not the regressed will be better but no guarantees for any particular player.

3. I have never condoned only using shooting percentage, only not ignoring it.

Finally, what you have never addressed is Tom Awad’s work where he grouped players and found that shot quality (shot location) and shooting talent (shooting percentage) are combined more important than shot generation (corsi) when determining what makes good players good.

So once again, any study that successfully minimized sample size issues with shooting percentage have found that shooting percentage is as much or more important than corsi in differentiating player value.

Your shifting the burden of proof on me lol. Amazing.

Others have established that 3000 minutes isn’t enough through various studies and shown that it takes longer. They’ve shown that shot % isn’t exactly repeatable and is highly random even over a 5 year sample size.

Yes the theory is that regressed leads to better conclusions. It’s a pretty basic concept and it works for the vast majority of the bell curve. Using a handful of outliers to handwave it away is silly but that’s basically what you’re doing.

The Tom Awad article has some interesting data but I don’t think you can draw the conclusions that he does. Others have shown that corsi is more repeatable than delta and more predictive.

edit: I was looking at 5 year for Morrow. http://stats.hockeyanalysis.com/ratings.php?disp=1&db=200813&sit=5v5&pos=skaters&minutes=3000&teamid=0&type=goals&sort=ShPct&sortdir=DESC

Being a scientist and engineer I love the analytics to provide some better insights into a great game! However, the analytics crew in the hockey world could really take a few tips from old Edward Tufte (http://www.edwardtufte.com/tufte/books_vdqi). His ideas would sharpen up many of the graphs seen on hockey nerd blogs.

Oh ya, your corsi% and GF% graph is a super strange way to show a simple linear regression. Just plot corsi% vs. GF% in a scatter plot and put a simple linear regression through it and display the R^2 value (amount of variation in the data which is explain by the line).

Keep up the good work!

Pingback: Defensive defensemen are struggling in the playoffs | hockey-graphs