• Percentiles

    by Tyler Dellow • May 16, 2007 • Uncategorized • 20 Comments

    I’m having a lot of fun looking at percentiles – it provides a useful sense of scale. In a related story, I’m a dork.

    Assuming that last year was similar…wow, did Shawn Horcoff ever have a great season in 2006-07. He would have been over the ninetieth percentile in terms of ES offence. The much maligned Raffi hits just below the seventieth percentile in everything this year, except shots, oddly enough, where he just scrapes over the sixtieth. Danny Briere is the guy at the top end.

    The Oilers defence doesn’t come off as poorly as you might think. Staios was an 80th percentile guy in terms of points. Tjaggy was alright, 70th percentile-ish. Tom Gilbert put up spectacular numbers in the limited time he played – 1.61 ESP/60 is a) fantastic, b) unlikely to be sustained and c) small sample. Still though, he looked comfortable with the puck, which isn’t something that you can say for a lot of the other guys. Dan Hejda, who I strongly suspect that they won’t re-sign because they have to find minutes for Laco, Greene, Grebeshkov and, arguably, Gilbert, had 0.83 ESP/60, whch is a respectable enough performance, considering that he was playing at altitude. Greene, Laco and Jason Smith are all sub thirtieth percentile. You’d think that you need some defensive chops to survive that way but apparently not.

    RiversQ suggested the 4.0 PPP/60 as a dividing line between good and bad PP performance a long time ago. That might actually be a little low – good and bad are subjective terms but when you’re talking about that as a mid-point, maybe it’s not so good. More interesting to me is Ales Hemsky’s performance. This year, with a buggered up arm and an incredibly uncreative PP that focused on generating low percentage shots from the blue line, he was at 5.45 PPP/60; last year he was well above the 90th percentile, IIRC. He’s a legitimate star on the PP – the Oilers need to run his PP minutes through the roof. Not doing so is negligent.

    If you treat Stoll and Sykora as basically being defencemen for PP purposes, they had solid seasons – both up into the 80th percentile. If I’m Kevin Lowe and I’ve got money to spend to make next year’s PP better, I’m spending it up front.

    I like this chart because it provides an interesting illustration of the drop-off in scoring ability and, maybe, provides some support for the idea that what kills you isn’t paying too much for the elite but paying too much for those right below the elite. Notice how big the drop-off is in all cases from the top to the 90th percentile. It’s far greater than the ensuing dropoffs, except at the very bottom, which mostly consists of the JF Jacques’ of the world, guys who were fringe players who barely played and did nothing.

    About Tyler Dellow

    20 Responses to Percentiles

    1. May 17, 2007 at

      I don’t want to say something that is obvious, but those graphs are almost “perfectly” normal. Now skill can be distributed normally (but I think it’s more likely to be skewed), however random variations are (basically) distributed normally.

      Translation: It’s hard to know if the drop off is the result of normal variations or if they represent a true drop off in skill at x% level.

    2. mc79hockey
      May 17, 2007 at

      Hey, I’m not a hardcore math guy, so say obvious things all you want. Am I to take your point as being that the graphs look like the distribution of points is entirely random? Or that I’ve somehow buggered it up?

    3. RiversQ
      May 17, 2007 at

      Cool stuff. Yeah my 4.0 PPP/hr cutoff was pre-lockout and probably went out the window with all the extra PPs and particularly all the extra 5on3 PPs.

      Of course, 50% of 208 forwards is just over 3 forwards per team, so it’s really not too far off. It’s not bad for the dmen either as that cutoff gives you roughly one dman per team.

      Anyway, I originally thought it was closer to a cutoff for elite PP performers prior to the lockout and it appears it might be a little closer to the replacement level.

    4. Showerhead
      May 17, 2007 at

      If one is to take your conclusion to be accurate (that Edmonton should be going for powerplay presence up front) then what of a trade for Maxim Afinogenov? The guy is also a PP machine, not a Ruff favourite, and makes enough $ that Buffalo might think of dealing him and keeping their two star centers.

      Of course none of our roster players make sense for Buffalo, though I’m sure some of our prospects would… and also Mr. Afinogenov would mean even less sandpaper in Edmonton’s lineup.

      I hold by it though: Maxim and Ales would make 2/3 of an amazing NHL 2007 line.

    5. RiversQ
      May 17, 2007 at

      How portable is Afinogenov? I realize he’s a RWer with a LH shot, but so was Samsonov and he apparently couldn’t (or really didn’t want) to play on the left wing. This doesn’t matter for the PP, but since the Oilers are unlikely to find a 1st line LWer and an impact PP player, they’re going to have to kill two birds with one stone.

    6. RiversQ
      May 17, 2007 at

      Whoops scratch that about Samsonov. He’s RH.

    7. RiversQ
      May 17, 2007 at

      Ah, I might as well continue to litter the comments with crap…

      Samsonov’s a RH shot that only plays the LW and the Oilers couldn’t get him to play the RW. Afinogenov’s the opposite but the point remains since the Oilers aren’t dying for a RWer.

    8. May 17, 2007 at

      Am I to take your point as being that the graphs look like the distribution of points is entirely random? Or that I’ve somehow buggered it up?
      You haven’t buggered anything up. It’s just that the graphs look identical to random data. Skill can be distributed normally, meaning those graphs could be graphs of skill, but it’s more likely that the large spikes at the end points are simply the result of players who were very lucky. A player who was in the 90th percentile one year is more likely to be in the 70th or 80th percentile the next year. Of course you often have to pay for the luck when you sign a player. So it’s often better to find unlucky players who have skill than choose the skilled players that got lucky (if you can figure out the two groups).

      My main point, just because a player made it into the 90th percentile (or 80th) doesn’t necessarily mean they are the best.

    9. Showerhead
      May 17, 2007 at

      My main point, just because a player made it into the 90th percentile (or 80th) doesn’t necessarily mean they are the best.
      *points at Sheldon Souray*

    10. RiversQ
      May 17, 2007 at

      My main point, just because a player made it into the 90th percentile (or 80th) doesn’t necessarily mean they are the best.

      I agree that applies to a single season, but moreso for PP scoring due to the relatively small sample size.

      However, if Tyler looked at 3-4 years worth of PP production and set the cutoff at 400-600 min total PP time, I would expect that the graph would look pretty similar to an entirely theoretical graph of PP skill vs. percentile. It wouldn’t surprise me if the shape was virtually the same either.

    11. May 17, 2007 at

      However, if Tyler looked at 3-4 years worth of PP production and set the cutoff at 400-600 min total PP time, I would expect that the graph would look pretty similar to an entirely theoretical graph of PP skill vs. percentile. It wouldn’t surprise me if the shape was virtually the same either.

      Scoring rate error decreases as a function of sqrt(t), so to double your accuracy you have to quadruple your # seasons. So you’re correct to say the graph would look similar with more data.

      With three seasons I had a random sum of squares of 1 and a non random sum of squares of about 0.69 meaning randomness still accounts for a huge chuck of the variability.

      Three years worth of scoring data [.xls].

    12. sketchy
      May 18, 2007 at

      The NHL is not a normal distribution anyway…take into account retirement, euro players, injuries, etc…try a K/W test on that population – T has provided a reasonable assessment of the 2007 season.

    13. sketchy
      May 18, 2007 at

      2006 season, pardon me. Barley Pops.

    14. May 18, 2007 at

      The NHL is not a normal distribution anyway
      Do you mean intrinsically, or in is this statement in terms of what we actually see in the data? Because, on a theoretical level I would certainly agree, I generally consider skill exponentially distributed. However, most tests on the data show extremely normal data.

      A Kruskal-Wallis test?

      Normality test: reject H0: Data is normal, for alternative HA: Data is not normal when p 10%
      Kolmogorov-Smirnov: P-Value > 15%

      I have insufficient evidence to conclude the data is not normal.

    15. May 18, 2007 at

      Sorry, my ‘less than’ signs created some problems with this site.

      Normality test: reject H0: Data is normal, for alternative HA: Data is not normal when p ‘less than’ 5%

      Normality: PP plus/hour (243 players)
      Anderson-Darling: P-Value = 55.7%
      Ryan-Joiner: P-value ‘greater than’ 10%
      Kolmogorov-Smirnov: P-Value ‘greater than’ 15%

      I have insufficient evidence to conclude the data is not normal.

    16. sketchy
      May 18, 2007 at

      Okay so a KW test is not even functional in this case, but I stand by the theory that, by nature, the population is not normal skill wise. The goal of a player in an NHL game is to perform better than his counterparts – but that is not always to score more goals. Player style, coaching style, game situation, all play in to the scoring data. If a team is winning by 6 does the coach send out his top PP unit or does he back off and send out some plumbers?

      In a nutshell I think there is a bias somewhere in the data interpretation. The /60 stats are really good but a very effective PP unit plays less time (due to goal scoring) than a mediocre one, which increases the rate faster than if the full two minutes were always played.

      I suppose I could go brush up on my stats and figure it out…

    17. speeds
      May 18, 2007 at

      I’m not a particularly knowledgeable guy with regards to formal stats (I think I took one or two STATS classes in university), but when one says that in the NHL talent is normally distributed what exactly do you mean?

      If the talent among all hockey players in the world is normally distributed, wouldn’t NHL players, more or less, comprise the far right end of the curve? Meaning that the bottom tier players are “interchangeable” with the stars outliers to the extreme right side of the curve? True, the bottom of the NHL talent pool wouldn’t likely be exactly equivalent since some of the best players play elsewhere, but generally speaking the bottom would be more or less “interchangeable”

    18. speeds
      May 18, 2007 at

      ps. I’m not claiming to create this analysis, I read it elsewhere and it made sense when I read it. I think it was at Tom Benjamin’s site.

    19. sketchy
      May 19, 2007 at

      I don’t think you can have a normal distribution of talent. Talent is not measurable in the respect that you can have 50 litres of goal scoring, a hogshead of defense, or twenty two drams of goaltending.

      I suppose if we looked at ‘talent’ as a limited commodity in the NHL (it is not) we would have the middle 68% of players being nearly interchangeable, 13% being better than average by one factor and 2% being better that the average by 2 factors with a mirror image on the other end (13% worse by one factor and 2% worse by two factors) along with some outlier values (Crosby, Ovechkin, Brodeur) who are once in a generation talents.

    20. Baroque
      May 19, 2007 at

      Not only is there a question of definition of talent (and I was unaware that defense was measured in hogsheads, btw), but there is also the inherent problem of independence.

      If measuring a particular parameter (eg skating speed) you would assume a normal distribution would emerge partially because it is an easy quality to measure with a rink and a stopwatch, but also because flat-out speed of say, Marian Gaborik is independent of the speed of Pavol Demitra. If looking at some amorphous quality such as talent, and reducing it to, say, goal-scoring as a surrogate parameter, then the value for Gaborik is no longer independent of the value for Demitra. When variables are not independent, most statistical tests have a lot of issues. This muddies the statistical waters with a lot of team sports–think about how many times you hear about a quarterback being great if his receivers didn’t have butterfingers, or a pitcher would have a far better record if his run support didn’t stink, or a goaltender wouldn’t look as impressive if the team in front of him didn’t play such good defense.

      Still, considering no one has an iron-clad definition of something as apparently simple as “intelligence” it doesn’t surprise me that “talent” is equally hard to define. The numbers are illuminating, but not the final word, I think. Humans are too quirky and unpredictable of a species.

    Leave a Reply

    Your email address will not be published. Required fields are marked *