• Desjardins’ Quality of Competition Numbers

    by  • April 16, 2007 • Uncategorized • 20 Comments

    A lot of people, myself included, have referenced Gabriel Desjardins’ quality of competition numbers lately. While it’s great that there seems to be an increased awareness of this sort of thing, his methodology, found here, warrants some discussion, as I think that it has a few problems. I don’t mean to take unfair shots at the guy – I’ve done stuff like this before and it’s not easy, but the methodology would benefit from some improvement, I think. Desjardins explains Quality of Competition as follows:

    How is the ‘Strength of Opponents’ or ‘Quality of Competition’ statistic calculated?

    It is the average On/Off-Ice +/- of the opposing players a player faces. For example, if you lined up against Anaheim’s top line, you’d get:

    Name Pos Team # On/Off +/-
    KUNITZ F ANA 14 +1.97
    SELANNE F ANA 8 +1.65
    PRONGER D ANA 25 +1.61
    MCDONALD F ANA 19 +0.94
    NIEDERMAYER D ANA 27 -0.31

    The strength of opponent would be the average of 1.97, 1.65, 1.61, 0.94 and -0.31, which is 5.86/5 = +1.17. In general, if a player matches up against the other team’s first line, he’ll face a high strength of competition.

    You can see from this that On/Off-Ice +/- is a crucial part of the numbers that he’s cranking out and, accordingly, you’re going to need to know how that’s defined before you can really evaluate what he’s doing here. On/Off-Ice +/- is defined as follows:

    What is On-Ice/Off-Ice +/-?

    It’s the difference between the team’s plus/minus rate when a player is on the ice and when he’s off the ice.

    Why subtract the Off-Ice +/- from regular +/-?

    Hockey’s a team game; a good two-way player on a bad team (Peter Forsberg, Ryan Smyth in 2006-07) will have a lower +/- than most players on a good team like Detroit or Nashville. That’s obviously not an accurate reflection of Forsberg and Smyth’s performance because they’d have a much higher +/- over the course of a season with a good team. Plus/minus relative to the rest of the team’s performance is a more accurate reflection of a player’s ability to score and prevent goals.

    I’ve got some problems with this and, because he uses these numbers as a basis for his quality of competition numbers, some problems with that. My first problem lies with this statement: “Plus/minus relative to the rest of the team’s performance is a more accurate reflection of a player’s ability to score and prevent goals.” I don’t think that this is necessarily true. Take the Oilers as an example this year. Ryan Smyth wasn’t on the ice for a single one of JF Jacques’ 0 ESGF and 11 ESGA. Other than the extent to which coaching decisions relating to their usage were driven by the desire to minimize the usage of Jacques’, I can’t figure out how Jacques’ performance is relevant in our evaluation of Smyth’s. To keep things clean, we’ll assume that they didn’t play together and that’s why there are no events with the two of them on the ice together. What happens on the ice when Smyth isn’t there is irrelevant to how easy or how difficult it is for him to build a good +/-. Smyth is being credited here for something that he can’t control and that apparently didn’t impact him.  Obviously, since the Quality of Competition numbers are based on these numbers, they suffer from this flawed assumption.

    There’s another issue here as well.  Desjardins is effectively assuming that each player’s offensive contribution is equivalent when he does this – he adds together the five players On ice/off ice +/- and divides by five.  I doubt that this accurately reflects the contribution – I suspect that forwards, at the very least, impact the offensive side of the +/- equation more than the defenceman.  Treating Pronger as a +1.61 player…well, he probably isn’t really, that number is probably driven by the forwards.  The whole thing needs to be weighted differently.  I’ve got some ideas on how, which I’ll get into at some point in the future.

    A third issue.  Consider a player who played only against terrible players and therefore had good numbers.  The players who played against him would look like they were playing hard minutes when judged by this player’s numbers when, in actuality, they weren’t.  There’s a bit a of a cyclical thing at work here.  My understanding is that you can resolve this by some mathematical trick or another; I just can’t remember the name.  It’s something that needs to be brought into this equation.

    Despite my criticisms, I think that what Desjardins is doing here has real value, it just needs some sort of sensible rejigging.  It’s a great first step and, from the perspective of someone who would like to see this sort of stuff – hockey stats that are supposed to mean something - catch on, I’m happy to see it.  I just think that it needs some work from the logic/hockey side of things.

    About

    20 Responses to Desjardins’ Quality of Competition Numbers

    1. April 16, 2007 at

      I think you’re pretty on here, mc. One thing I might add is to factor in the sixth player (the netminder) also. I assume we’re trying to get at a player’s expected plus-minus or something contextual like that, and the goalie playing in each net should factor heavily in that.

    2. April 16, 2007 at

      I’ve only just recently tried to get my head around Dejardin’s numbers and methodologies. While I think they are, as you say, a step in the right direction, I agree there’s probably some issues to be worked out.

      Just looking at the Flames advanced stats – some of the stuff makes a lot of sense and some of it seems questionable – speaking from a 5,000 foot level from the perspective of someone who follows the Flames very closely.

      Some of the revisions you suggest here sound reasonable, but I’m hardly a noted theorist in this area, so…

      Great post though.

    3. RiversQ
      April 16, 2007 at

      That’s a good list of problems. Another issue is that he doesn’t filter out EN GA either. I assume 6V5 goals are still in there as well.

      I exchanged some posts with Desjardins over at HF on this topic a few months ago. The biggest problem I have with Desjardins is that his method kills good players that play against other good players on a routine basis. Guys like Horcoff, Richards and Crosby are playing against each other for the most part and by doing so, they hurt each other’s quality of competition under this approach. Now those guys manage to remain near the top of the list but there are a lot of players like Zetterberg, Modano and Thornton that are near the bottom that I find hard to believe.

      Desjardins has admitted this to be a problem, but it’s a hard one to fix. It seems his underlying assumption is that checking lines still exist to a large extent in the NHL and frankly I don’t think that’s the case. Sure there are exceptions like Anaheim, but by and large a lot of the good players play each other pretty regularly.

    4. April 16, 2007 at

      I have similar stats, but I do a regression on how well the player performs in different pairs (if they make all the pairs better than their score will improve compared to their average). My system only looks at how well you performed while on the ice, so depth players on Jagr’s team wont be hurt because Jagr is so strong offensively (unless of course they play with Jagr)

      I summarized all this as one statistic on my website (VAL), but I have recently broke it down similar to what Desjardins does.

      I hope to convert this into a regression on triples, but this is going to take 35 hours of data processing (I need a better computer). Triples should fix some of the Buffalo issues as forwards lines will be looked at as one data point not 3.

    5. April 16, 2007 at

      It should come as no surprise that the methodology is flawed. I probably have to invest 20 minutes writing code to parse play-by-play files for every minute I get to spend actually analyzing the data. (Meaning I haven’t had much time to think about analyzing the data.)

      There are a lot of regressions that can be run on this data to figure out better splits between the forwards and the defensemen. I’m hoping to get a look at that after the playoffs are over :) I don’t have an answer for how to resolve the problems “quality of competition” has for good players who face good players.

      I should add that it has always been my intent to make my methodology and data 100% public domain. The more people working on the problem, the more likely one person is to find a better solution. I’d be happy to send the entire database to anyone who’d like to play around with it.

    6. April 16, 2007 at

      Good post mc79. After comparing some Flames/Oilers numbers I don’t think any of the QoC numbers are particularily useful at this time.

      I’m sure this will not discourage Desjardins as he has many ardent supporters (LT comes to mind) and hopefully with a few tweaks the strangeness of some of the calculations will go away.

    7. April 16, 2007 at

      I’d be happy to send the entire database to anyone who’d like to play around with it.

      Tempting, but I doubt I’m the guy to make sense of it.

      Out of curiosity, though, what’s the db look like structurally? How big does it end up being? And is it purely this year’s stats, or does it date back to last season or beyond?

    8. April 16, 2007 at

      After comparing some Flames/Oilers numbers I don’t think any of the QoC numbers are particularily useful at this time.

      I wouldn’t go so far as to say they’re useless either, but I think they could potentially be more powerful with some tweeks.

      I have confidence in the assorted minds out there. I’ll be waiting idly by till then, ready to reap the benefits.

    9. April 16, 2007 at

      Out of curiosity, though, what’s the db look like structurally? How big does it end up being? And is it purely this year’s stats, or does it date back to last season or beyond?

      Ultimately, it’s just a bunch of tables – csv files. There’s goal, shot and icetime data for every game, plus a set of 1000×1000 tables for head-to-head icetime for every strength situation.

      I have databases going back to 1999-2000, but I really only find this year’s data to be interesting since it’s the first full season we have head-to-head icetime for.

      On another note, with the imperfect dataset we have, there’s a limit to what we can accomplish. Consider Alan Ryder’s “shot quality” – he did pretty much all you can given the information in the shot data.

      I spent a lot of time trying to figure out the “best” way to make advanced stats for MLB and NFL, and there’s just too much uncertainty. For baseball, you don’t know positioning before the ball’s in play. For football, you don’t even know who’s on the field.

      I figured I should go with the simplest possible hockey algorithm. We can adjust the linear weights, but I’m not sure you can ever get these kinds of stats to be taken without a grain of salt.

    10. RiversQ
      April 16, 2007 at

      I think it’s great stuff Gabriel. It’s even better because you pretty much have full transparency for your methods. We might sound like we’re whining here, but you have to identify the problems before you can fix them. The next step is to fix them and I’ll certainly be thinking about it.

      It is most definitely not my intention to discourage you. I want to make that perfectly clear.

      In fact I’d say you’ve done as much in the past couple of months to advance hockey data analysis in the Oilogosphere and on Oiler message boards as most of us have done combined over the past 2-3 years. It used to be something taken seriously by a select few and now quality of icetime is something that seems to be widespread in our little part of “teh intarweb.”

    11. lowetide
      April 16, 2007 at

      I think the difficulty is going to be in slotting players over the long haul. Injuries might cause a coach to shelter a Steve Staios for a time and how do we measure that?

      I’d suggest that something along the lines of Bill James Project Scoresheet is in order. We’d have to get 60 (2×30, 2 for each club) reasonably intelligent people to fill out “game forms” along with pulling off the shift charts.

      Failing that, we’re all prone to our own bias.

      PS: Gabriel, you rock. :-)

    12. April 16, 2007 at

      I think Gabe has done a great job with this. As others have said, it’s brought the subject into hockey conversations. That’s the most important thing to my mind.

      For the reasons that mudcrutch, and others on this thread, have mentioned … it looks like it shouldn’t work at all. But it seems to. At least relative to teammates, by and large the order seems right to me.

      For some reason some players get caught in a vortex one way or the other, and end up unfairly flattered or shat upon in terms of QofO there. C’est la vie.

    13. April 16, 2007 at

      I have no data to support this notion, but is it different facing Selanne & co. 5-on-5 when Anaheim is trailing vs. when Anaheim is tied or ahead?

      Is there something else that should be factored in other than strictly personnel, like the score? It seems to me there’s times Selanne’s out to get a goal, other times he’s out there not to let one in.

      Just curious if any of that holds in the data, or if the data exists for that kind of question.

    14. April 16, 2007 at

      Earl:

      I think so. Absolutely. Just goes back to the same things we see in a game.

      Jeff at ‘Sisu Hockey’ developed the right framework for that, back before Christmas. He proceeded to use arbitrary constants and then draw contradictory conclusions. But clever execution is no big deal anyways, millions can do that, clever ideas are rarer in my opinion. And he nailed that, or so I think.

      He sent me an email inviting my criticism, presumably he did the same to many others as well. Mad shit is that. As I type this I’m wondering about Jeff’s childhood. I probably should have responded, as should have others. Whatever.

      Slipper regularly nails a lot of the things that the counting numbers don’t capture as well, thankfully he’s crap at math, and personally I’m in two minds as to whether or not him and Desjardins chatting via email is a good thing. I dunno.

      I don’t know how many rocks we need to lift anyways. These are the salad days of the Oilogosphere. Considered opinions are common around here now, noise is low.

      It’s a balance. Things are good. Enjoy the sunshine, Earl. :D

    15. April 16, 2007 at

      [thanks guys!]

      Earl, I don’t know if you saw the original Protrade site, but they were trying to analyze football and baseball using the game state. I put the same thing together for hockey; the relative difficulty of the shift was determined by a team’s expected winning percentage based on the score and time remaining. Very similar to those expected runs matrices you see for baseball.

      I found it was difficult to understand the results, and it really favored guys who played the last couple of minutes in close games and in overtime.

      I have to admit I was won over by Roland Beech’s take on basketball: either you have a fairly simple +/- view of the game; or you need in-depth charting. His staff charts every pass now, and they make money off it – but it’s a huge undertaking for hockey…

    16. April 16, 2007 at

      It’s not that I’m so bad at math, but I’m definitely a shit for brains when it comes to computers.

      I figure that if I email every person I find on the internet that gabs about hockey numbers and act reasonably enthusiastic about doing some heavy lifting myself, I’ll eventually be able to persuade one of them to publish a graph plotting +/- events in relation to shift length. After that there’s nothing more about hockey I really need to know.

      Christ, it took me two years to get a shift chart reader on my computer and by the time I finally understood how to cut and paste the bitmaps into the bloody thing there’s an entire fucking webpage that offers it at the click of the button!

      I mean why can’t some of you people just give a heads up about what’s heading down the pike so I don’t waste the few precious brain cells I have left trying to fuse creation and science together?

    17. lowetide
      April 16, 2007 at

      Slipper:

      You may be my son. :-)

    18. mc79hockey
      April 16, 2007 at

      Cool discussion. Given that there’s someone with a solid DB and the technical know how here now and a bunch of people who have some sort of grasp of the ideas at play, I’d be interested in kicking around some ideas on this if people are better. I’m sympathetic to Gabriel’s point as it relates to the time to track the data versus the time to work through it logically – more people kicking around ideas on the methodology probably wouldn’t be a bad thing, if he’s so inclined.

      I take Earl’s point (btw, if you haven’t read his post about Minnesota v. California, head over to his site and do so) about the situational stuff but for me, right now, I think it’s enough of a battle just to come up with some sort of rational overall numbers. That’s just me though – Americans do love their situational statistics.

    19. April 17, 2007 at

      Anybody, who wants the PBP and TOI database, email me (info at behindthenet dot ca) and I’ll send it out when the playoffs are over with some useful documentation.

      Slipper – email me the specifics of your idea. I think we can hammer that out and get you moved on to watching the NBA.

    20. July 17, 2014 at

      Good time very nice web site! Gentleman. Fantastic. Wonderful. I will save your website plus consider the for additionally? I’m just pleased to find numerous valuable info listed here in the offered, you want create further approaches on this consideration, we appreciate you revealing.

    Leave a Reply

    Your email address will not be published. Required fields are marked *