• Open Source the Oilers

    by Tyler Dellow • October 10, 2011 • NHL • 31 Comments

    I sat down this afternoon and recorded all of the Oilers touches for the first ten minutes of the game. (Coincidentally, Jonathan Willis has just written a post on something similar that he did last night.) The good thing about this: I think you pick up a lot of interesting stuff. The problem: it takes bloody forever to do.

    I’m wondering whether there’s any appetite to build a group of people to do this. It took me about two hours to do ten minutes of game time; I would expect it would be less once I had some more experience at it. I divided the rink into 24 zones and recorded where each event started and ended. I did, I think, come up with some interesting stuff, even in only ten minutes. I was recording what happened with the puck when a player touched it and where he touched it.

    So, for example, Oilers defencemen attempted 11 passes in their own end in the first ten minutes at EV. Petry tried one, Barker, Smid and Sutton two and Gilbert four. They went 7 for 11 with the passes. Sutton went 0/2 – watching it this way made me notice how he, in these ten minutes at least, tended to be in less of a good position to make a pass, as both of his passes had him basically trying to play it off the boards blind to a winger he knew was there. Gilbert was 3 out of 4 and, to my eye anyway, seemed to consistently be in a better position to make a play with the puck. If that 3/4 to 1/2 difference were to continue, that’s a big deal, I’d think. With this sort of data, we could start to ask questions like what influence a decreased success rate in passing out of one’s own end has on goal data.

    Whenever a player skated a puck from one box on the ice to another, I recorded a “skate” event. I determined success based on whether the player lost the puck – if he chipped it in or attempted a pass or a shot, it was successful. There were 18 “skate” events at EV in the first ten minutes. The only guy who had unsuccessful ones were Eric Belanger, with both of his ending in him losing the puck. Dennis commented over at Lowetide’s that he noticed Belanger labouring and that there was some mention of a problem with his back.

    The Oilers gained the blue line five times at EV during the first ten minutes (there was another offside). Smyth and RNH both gained the blue by skating it in, Omark got a touch on a pass from Petry that ended up in the offensive zone, Jones dumped it in once and Hall passed to RNH to gain the blue line. With a more substantial data set, you could start to look at outcomes based on how the blue line was gained.

    There is, in short a ton of stuff that could be tracked and who knows what sort of investigations could be done on it. As indicated above there’s a problem: it takes bloody forever to do. There’s also a solution though, I think, which is sharing the recording work. At a bare minimum, I think I’d want 15 people or so; that’d be enough for a group each doing four minutes a game, which should take 30 minutes or so once people got the feel for it.

    Is there any interest? I’m open to discussing the parameters of what gets tracked and what doesn’t. If you don’t want to comment but are interested in participating anyway, feel free to email me at mc79hockey@gmail.com. If you think “Gee, this is something I’d be interested in, but I’d want to have a place to write about it where people might see what I think,” don’t worry – within reasonable limits, I can provide that space.

    About Tyler Dellow

    31 Responses to Open Source the Oilers

    1. Stratedge
      October 10, 2011 at

      I find this interesting and would like to take part, but before doing so would like to have a fast and furious discussion of the methodology before we get to far down the road.

      Some early thoughts:

      - Does something in this process document the circumstances under which an event began? Do you differentiate between a defender taking control and making a pass after a dump-and-change, versus one who grabs a loose puck while on the PK and making a passp under significant duress? If so, how?

      - Method of data entry? Do you have a DB for this?

      - What are the events; the offensive events would seem relatively straight forward, but do you attempt to record defensive events as well? (insert Horcoff joke)

      • Tyler Dellow
        October 10, 2011 at

        -Agree on a discussion re methodology being important.

        -Events I was recording were pass, touch, shot, dump in and skate. Basically whenever a guy touched the puck, I made a note. I’m open to discussing what we record and how we determine whether it’s successful. Puck battles might also be a good one.

        -I was recording this in Excel. Ideally, we’d come up with a DB of stuff. I just had six or seven columns in which I recorded the info.

        -Nothing differentiating circumstance as of yet. If someone has ideas as to how, I’m all ears.

        I’ve had an email expression of interest too – that’s three of us. Not bad for Turkey day.

        • October 10, 2011 at

          I can’t help with the actual reviewing, as I lack reliable access to the games and have no recording devices anyway (hampered me with helping Staples out too, and that hasn’t been fixed).

          But if you need a technical hand, I’d be happy to help there. Google Docs or Excel may be your best bet anyway, if only cos it’s simpler, but you never know.

          Whatever, this seems like an interesting project.

    2. Stratedge
      October 10, 2011 at

      Oh… And I thnk the 4 zones behind each of the goal lines should be numbered separately.

      • Tyler Dellow
        October 10, 2011 at

        Heh. I decided the same thing about six minutes in to the game. Obvious first change.

    3. Woodguy
      October 10, 2011 at

      Excellent stuff Tyler, I look forward to looking at the results.

      I could handle a segment when not travelling if you don’t need the data right away. Is within 24 hours of the game ok?

      I look forward to having a new set of data with which I can use to justify my narratives with spurious correlations.

    4. October 10, 2011 at

      I’d be willing to help but I don’t currently have access to recorded copy’s of the game (no pvr etc.) do you have a link I can dl the games from? If so my time is your time.

    5. Tyler Dellow
      October 10, 2011 at

      I’ve had some good response to this – a couple emails too. I’m going to Twitter promote it again tomorrow.

      Don’t worry if you don’t have a PVR or if it would sometimes take you a while to get to a game. I’ve got an NHL Vault account that can be devoted to the project.

    6. mclea
      October 10, 2011 at

      If you just recorded turnovers (who turned it over, what zone the turnover occurred in, who caused it, whether it resulted in a scoring opportunity) you could capture and record substantially all the significant events in a game in a fraction of the time. And you could also do it with an excel spreadsheet while watching the game live.

      I’d volunteer to do this if I had some company to weed out the mistakes/differences of opinion.

      • Till_Horcoff_Is_Coach
        October 10, 2011 at

        I think turnovers would miss a lot of data – in particular let’s call it the Smid and CFP factors.

        Smid looks fine when relied to do defense, but starts to show some warts when expected to move the puck forward. Showing that the puck continuously cycles to guy A which then results in X turnovers and Y successes is much more meaningful than just the X and Y of turnovers for each player.

        As for CFP: players of his ilk not only make the right play, they take that extra second or make the extra move to attract the forecheck and give their partner more time to accept the puck. This aura effect would also be hard to capture on just turnovers but would be possible by something that Tyler is proposing.

        Both would be hard to suss out of the data not doubt, but they would be possible. Then again, perhaps I’ve lost my mind from the food.

        Tyler: I don’t have the time to commit to the project, but I could do some grunt work on ways to vastly simplify the input process. Email me if you’re interested and we can toss some ideas around.

        I hope the project moves forward so we can see what is derived from it… there’s a lot of possibilities.

        Cheers.

    7. mclea
      October 10, 2011 at

      I explain my rational for tracking turnovers in the comments to this post:

      http://www.mc79hockey.com/?p=3066

    8. Julian
      October 10, 2011 at

      a) I’m curious, I’d be interested in helping out with this a bit I think, though I already offered Gabe help in scoring Jets games this year a bit.

      b) you’re a soccer guy… what do you think of wearing team canada hockey sweaters to the canada v puerto rico soccer game tomorrow night? I’m probably going with some friends, but I really don’t have anything soccer/canada related… lots of hockey canada stuff though, scarves, tshirts, sweaters, jerseys, etc. Should I just wear something red and go with that? This is the most i’ve thought about my personal style since the last wedding i went to, important stuff here.

    9. Tyler Dellow
      October 10, 2011 at

      I’ve had some more emails, so it looks like we’re getting a group that might be of interest. I’m trying to get a little more publicity to see if I can’t increase the size of our group a bit. In the meantime – does anyone have any problem with me sending them an email on a group list in a couple days? Other people who’ve expressed an interest would see your email then. I won’t do it unless people affirm that they’re cool with that.

      • stratedge
        October 12, 2011 at

        I’m cool with it.

        I was thinking, since there’s a fair bit of positive response to this… how about having a handful of people “score” each game, and then making an amalgamation of those statistics available (average, median, …)? Maybe just take scores from everyone willing to do so. I am sure in a few weeks you’ll have half as many people as you had to begin with, that’s life.

        I’d also suggest scheduling a chat (or just having a group conversation via email) for discussion of exactly how these turn overs and events will be measured and recorded. Handle a few “what if”s, etc. Like I said earlier, it’s a good time to really have a good discussion and flesh things out, and if it takes a couple days it’s no big deal to go back and redo the 2 prior games. The sooner you get the ball rolling on that, the better, in my opinion.

        Looking forward to this!

      • Alan
        October 13, 2011 at

        Tyler,

        I’m one of the guys who e-mailed you about this, and I’m good with the group e-mail idea.

        Looking forward to collecting this data. Even though it doesn’t seem like there has been enough time to hash out the details yet, I’m going to take a run at doing this for a portion of the Oilers v. Wild game tonight. I’m hoping that just going through the process once or twice might illuminate some ways to refine the process, or even help identify any potential useful statistics that can be derived from the data.

    10. Tyler Dellow
      October 10, 2011 at

      Julian

      Wear a hockey jersey. Just don’t wear a TFC jersey. Also – we should chat at the game, I’m going with my girlfriend. Are you in Toronto full-time now? I know there are some blogger lad meetups coming – were you on the invite list?

      • Julian
        October 10, 2011 at

        Cool, thanks. Yeah, in TO full time. My girlfriend (Ellen hates soccer though (ok, justtremendously bored) so she isn’t coming. Pat invited us to watch the game Saturday night, so we’ll probably come to that.

        Not sure where I’ll be sitting Tuesday night, but I might wear my 2010 olympics misnumbered Toews jersey, so it should stick out a bit. Might be the only suitable red thing I have.

    11. DSF
      October 10, 2011 at

      Tyler.

      Kind of a detour…but I’ve always thought tracking time of possession and breaking that down into individuals on the ice would be very useful as it is in football.

      Would be far more useful than Corsi in my opinion and not all that hard to track once you define possession.

      • David Staples
        October 11, 2011 at

        I agree DSF. To track puck possession, the best thing is to measure it with a stopwatch. It’s done in Euro soccer. Breaking it down by player is not easy to do, but would be interesting.

        Anyway, good luck to Tyler and his researchers on this project.

        It’s funny: I was recently an evaluator in minor hockey, grading players in games so we could tier them properly.

        In the end, what I did was give a player a tick for a good play and an “X” for a bad play. Really subjective, but it was fast, I could do it in real time, and it gave me a fair indicator, I do believe. It enabled me to judge every play in the game, and I wonder if, in the end, it might give you a similar ranking as you will get form this project.

        Anyway, I like the low-hanging fruit, and I’ll stick with that. But good luck on this. It’s an ambitious project, and that’s what I love to see.

    12. Oilswell
      October 10, 2011 at

      Proper term is crowdsourcing. Couple of thoughts: make everything CC license (may not hold water in some jurisdictions because it is data, but probably worthwhile making it clear), and occasionally overlap times to help estimate inter-coder/error rates.

      Are you recording times so that game context, qual comp and qual team can be related?

    13. bill needle
      October 11, 2011 at

      Australian football records number of possessions and “disposals” and the type of disposals (kick, hand pass, etc.,) for every game and every player. They are able to announce the leaders in that game after each quarter, and often, during the play. It’s a sport where one one player is often covering his counterpart, so you can compare rival players by who’s getting the ball more and passing it to teammates more. While watching the games, I’ve often heard the analyst say teams will have to switch their coverage around because someone on the other team is getting “too much of the ball.”
      I’ve always wondered why hockey couldn’t do such thing, but it is usually faster paced. But not that much faster.

    14. October 11, 2011 at

      Ultimately, it would be great if a tablet application could be developed for this sort of work. A screen with a rendered rink and the ability to mark events just by touching it…

      I’m getting ahead myself, clearly.

      • Alan
        October 13, 2011 at

        Kent,

        That’s actually exactly the thought I had, but, agreed that it is quite a ways down the road. Still it would allow for simplistic recording in the future if this type of data collection became something teams wanted to do on a consistent basis.

    15. Roke
      October 11, 2011 at

      I hope you get enough volunteers to get this thing going. The data would be interesting to look at. The Red Wings game that was tracked at Behind the Net was pretty fascinating.

      It’s a shame that it’s difficult to track data in hockey because of the speed of the game. In an ideal world it would be possible to track stuff like Opta tracks soccer – two people watching a game marking stuff down and a supervisor adding detail. I think for hockey you would need to slow the playback of a recording down to make it feasible and God help you if you’re watching a game in Florida with the camera so close to the ice.

      As an aside, have you taken a look at the stuff Chris Boucher tracks watching the Habs? I’m not a fan of him putting everything into a single score and I haven’t bothered to ask how he watches the games and records the events. There is also fair bit there so I don’t really know to begin.

    16. October 11, 2011 at

      I’m in.

      While I agree with tracking turnovers because a) in the hands of the NHL they are subject to more institutional abuse than choirboys and b) in the NFL and NBA they correlate well to wins and hence player value.

      But I don’t want to complicate a project just getting under way. I would think that, for the first year, we would want to keep it simple and feel our way through the darkness as we build rigor and consistency in the system.

    17. October 11, 2011 at

      I’m good w/ the e-mails. Let me know when and how you would like to proceed. I’m kinda excited to see where this goes.

    18. Vic Ferrari
      October 11, 2011 at

      Ric said:

      While I agree with tracking turnovers because a) in the hands of the NHL they are subject to more institutional abuse than choirboys …

      That’s hilarious, and spot on, I think.

      • October 11, 2011 at

        They’re a freaking travesty.

        In large part because of the inconsistency.

        The Oilers giveaway the puck three times more often at home than on the road.

        ?

        Which gives us an opportunity and a challenge: Consistency.

        Perhaps a manifesto on what is what would be in order before we begin?

        Before we notice a huge spike in hits a every 4:00 into the third.

    19. Vic Ferrari
      October 11, 2011 at

      I’m impressed that you’re getting volunteers. When I first read your post it struck as being like a clinical trial designed such that the volunteers were being asked to have the test drug injected into their necks. A good way to do things … but not much of a chance of getting trial patients.

      Turns out Oiler fans are gluttons for punishment. There you go.

    20. DeadmanWaking
      October 12, 2011 at

      Kent: You’re only a few years ahead of yourself. I laugh though that you’re more concerned about the form factor than the computational muscle. We’re still at “original Steve Jobs” in the field of computer vision.

      Open Source Computer Vision

      Small and beautiful comes later.

      I’m thinking of a system that overlays an accurate Cartesian coordinate system onto the ice surface, similar to how the first down line is generated in football.

      To make this happen, you need a training corpus. A body of hockey video accurately annotated with features such as the blue and red lines hand annotated to train and validate against.

      Vic is right: getting volunteers for multiple neck injections is often difficult. Fractal compression fell by the wayside when graduate students dried up: it rocks when hand-tuned by slave labour. Often you have to manufacture a work-around. Twenty years ago the Canadian Hansard was a prime resource for natural language translation; one of the highest quality parallel translations available, notwithstanding a few fuddle-duddles. Google’s voice directory service was run to accumulate speech data. Many data mining projects start with a clever ruse.

      One of the problems here, of course, is copyright on the annotated video corpus. Multiple camera angles and smooth camera work are desired. Strikes me that the NHL’s generosity in this department is limited. Perhaps Tyler is not the best ambassador to extend the olive branch.

      And then there’s the time problem. A working open source real-time video illumination system could be three to five years down the road. There won’t be any shortage of computational horsepower with the GPGPU processors now becoming available. It remains a fairly demanding task on the software side.

      At the end of the day, it’s probably less difficult that what Open CV is already doing in the DARPA challenge context. On top of this, you feed in somewhat accurate shift charts, maybe supplement with some CC feeds or speech recognition of player names from the game broadcasts, do a little jersey recognition with Open CV, and then it’s just a quickie for the enthusiastic fan to validate the confluence. With cute little gestures on a touch pad so the needle stings a little less.

      At the same time, it’s hard to believe a professional sports league is ready for this degree of fan participation.

    21. Pingback: The Next Step In Advanced Statistics? | Edmonton Journal

    Leave a Reply

    Your email address will not be published. Required fields are marked *