James Mirtle’s series of pieces on the inventor of hockey statistics Gabe Desjardins and hockey analytics generally in the Globe and Mail a little while back touched off a bunch of interesting stuff. I always enjoy reading smart people who are intellectually honest and disagree with me. Elliotte Friedman, Tom Benjamin and my buddy Rajeev all fit that description.
Benjamin wrote a pretty lengthy post at his site that I think warrants some consideration – I encourage people to go read it. With that said, I have significant problems with what he said. Tom quite properly acknowledges that hockey analytics is pretty fantastic when it comes to blowing up media myths. He goes on to say that hockey stats will never provide enough insight though.
Hockey statistics will never do what we want them to do, which is to effectively evaluate individual hockey players. To give us answers when considering a trade or a personnel decision. To tell us whether this third line winger creates more wins than that number four defenseman. Baseball statistics can produce answers to these kinds of questions, while hockey statistics can only produce more questions.
Why? Because baseball statistics describe what actually happens in baseball games. The stats add up to runs and to wins. Hockey statistics do not add up to goals. Goals or proxies for goals like shots, shots and attempted shots or even quality shots, underpin all the analyses. None of these statistics say anything about how the scoring chance, the goal was achieved. The actual activities that go into creating the chance are not recorded. The fundamental problem is that hockey has the equivalent of runs, but the hits and walks that create those runs are missing from the statistical package.
I agree with Tom to a point on this. I can’t agree that, even in their relatively primitive state, hockey stats can’t give us answers when considering a trade or personnel decision. They may not be precise yet, to the point where we can say “Player X is worth precisely $Y.” They can inform our decisions though. We can learn to avoid guys coming off percentage fuelled seasons. (Most of us can; some of us seem to be the sort of people who just keep touching the stove element.) This, in and of itself, is important stuff – a stack of money has been wasted by NHL teams on these guys.
He’s right though, about baseball statistics describing what happens in a game, while goals and assists do not. Accepting that though, I don’t conclude that hockey stats can never give us more precise answers. I conclude that we need to start generating better statistics.
Since I first got interested in hockey statistics and analytic stuff back in 2002 or 2003, I have learned a ton of stuff about NHL hockey that I didn’t know before. The impact of randomness relative to the impact of skill on the percentages was news to me. The predictive power of things like Corsi and shots in terms of reflecting skill rather than randomness was too. The limited impact of shot quality was news. The way in which I think about hockey has been completely changed as a result of this stuff. I do, I think, have a more accurate view of things now than I once did. What’s more, I can back up things that I say and think in a way that I couldn’t when I started getting into this stuff. I’ve got a pretty decent track record of pointing out when NHL GMs do completely ridiculous things – I attribute a lot of that to the understanding of the game I’ve gleaned from numbers.
Tom does, it seems to me, concede the critical question:
With lesser players – the real challenge in objective evaluation – the individual numbers mean even less because different players contribute in entirely different ways. There are reliable defensive players who seldom make a mistake, but provide almost no offence. There are other defensemen who can make a good pass and score once in a while, but make too many mistakes without the puck. There are forwards who can score but do little to help the team move the puck into scoring position. There are forwards who can win puck battles, kill penalties and even move the puck, but their hands are stone. Goaltenders don’t do anything except guard the net.
It is the mixture of all the skills, the collective skillset, that produces team strengths and team strengths win hockey games. Team speed. Team toughnesss. Team goaltending. Team defense. Team offense.
We can’t objectively sort out the individual contributions to those team strengths. Until we can find the hockey equivilent of singles, doubles and triples from the organized chaos of the game, the statistical evaluation of individual hockey players with disparate skills is a mug’s game.
I think we are, to a degree, closer than he thinks now. I don’t quite agree with him on his hands of stone thing – when you start talking about lesser players in the NHL, you’re talking about players who probably have true talent on-ice S%’s of somewhere between 7% and 8.5% or so at 5v5, with randomness smearing things so much in a single season as to make it effectively a wash for analytical purposes, assuming we’re leaving out the real plugs, guys like Sandy McCarthy.
There are things we can do to tease out the impact that individual players make on the game. We know how important it is to have an edge in shots over your opposition. What we need to do is to come up with a way to start figuring out how to identify the players who create that and how they create that. The singles and doubles and triples of hockey. Soccer’s enjoying a bit of a tactics moment right now, with people like Jonathan Wilson and Michael Cox writing intelligently about tactics. They (in particular, Cox) are greatly helped by the data that’s available with respect to passes, average position on the field and such things.
As I’ve become a soccer fan, I’ve become really cognizant of the similarities between hockey and soccer. It’s funny – I think Tom would concede that these exist, because in the comments to his post, he classes hockey and soccer as games that aren’t susceptible to statistical analysis. I’d noticed it before, particularly when watching Olympic hockey, where players hold the puck longer. The dimensions and surface are different, and create some differences between the game, but ultimately, both are fluid games where you want to limit the opposition’s quality shots while maximizing your own. Soccer has, I think, moved a long way in front of hockey in terms of tracking the right stuff to answer these sorts of questions – there was a great article in the Financial Times earlier this year that’s worth reading if you’re interested in quantitative data and sport. An excerpt:
Yet by the mid-2000s, the numbers men in football were becoming uneasily aware that many of the stats they had been trusting for years were useless. In any industry, people use the data they have. The data companies had initially calculated passes, tackles and kilometres per player, and so the clubs had used these numbers to judge players. However, it was becoming clear that these raw stats – which now get beamed up on TV during big games – mean little. Forde remembers the early hunt for meaning in the data on kilometres. “Can we find a correlation between total distance covered and winning? And the answer was invariably no.”
Tackles seemed a poor indicator too. There was the awkward issue of the great Italian defender Paolo Maldini. “He made one tackle every two games,” Forde noted ruefully. Maldini positioned himself so well that he didn’t need to tackle. That rather argued against judging defenders on their number of tackles, the way Ferguson had when he sold Stam. Forde said, “I sat in many meetings at Bolton, and I look back now and think ‘Wow, we hammered the team over something that now we think is not relevant.’” Looking back at the early years of data, Fleig concludes: “We should be looking at something far more important.”
That is starting to happen now. Football’s “quants” are isolating the numbers that matter. “A lot of that is proprietary,” Forde told me. “The club has been very supportive of this particular space, so we want to keep some of it back.” But the quants will discuss certain findings that are becoming common knowledge in soccer. For instance, rather than looking at kilometres covered, clubs now prefer to look at distances run at top speed. “There is a correlation between the number of sprints and winning,” Daniele Tognaccini, AC Milan’s chief athletics coach, told me in 2008.
It strikes me that there’s probably a lot of stuff that can be mined in hockey, stuff that would tell us a lot about which players are most valuable. As I mentioned above, I’d be fascinated to see what sorts of things correlate well with possession and which players do those sorts of things. We’re getting to the point now that we know a lot of the stuff that is tracked is useless, I think. Hits and giveaways and takeaways…nobody serious cares about the data the NHL is generating there. I have increasing distaste for assists and points.
The real difference between baseball and hockey is that baseball’s essential data is largely public. The singles and doubles show in the statistical record. With hockey and soccer, that’s not the case – someone needs to figure out what the essential pieces are and, due to the cost and proprietary interest in that not becoming known, it’s unlikely to ever become public record. Soccer data is notoriously limited and controlled by the companies that collect it.
Hockey’s at least a decade behind soccer though. There’s no reason that has to be the case. There’s also no reason, given the number of insights into the game that have been generated from the relatively crappy set of data that the NHL collects at present to expect that this will continue. You give someone like Gabe or me a database of every touch that happens in a season and where it occurs, and I suspect a lot of really useful stuff would come up. That’s the future. I don’t even think it’s that expensive – if some guy put me and Gabe Desjardins in a room for a year and gave us a million bucks to spend on generating the data, I expect we’d find all sorts of stuff. That’s pretty cheap – if the Oilers did it, we could pretty much pay for ourselves just by stopping them from making one signing. Until that time comes though, when someone goes out and puts that together, there’s still a lot that we can learn from the data that does exist. I’m more of a believer than I’ve ever been that teams that don’t get into this stuff will end up being left behind and having to play a significant amount of catch up.