• Atom balls

    by  • November 2, 2008 • Uncategorized • 39 Comments

    The pants pissing that’s been going on to this point in the season amongst Oilers fans has been thoroughly entertaining I’ve actually become more optimistic than I was to start the season – if MacT can find a way to stem the hemmorhaging on the bottom two lines and if the kids are really not going to get brutally outshot, he might have himself a playoff contender a year earlier than I figured he would. I listened a bit to the Gregor’s show yesterday (mark me down as having preferred Stauffer) and one of the hot topics was the inability of guys like Shawn Horcoff, Erik Cole, Ales Hemsky, Sam Gagner and Dustin Penner to score. The volume of the “Fire MacT” crowd has been turned way up as well.

    Matheson touched on this the other day as well, pointing out the failures of those guys to score at ES. I’m having a hard time getting worked up about this. Here’s a summary of the ES shooting and scoring for Horcoff, Cole, Hemsky, Gagner and Penner so far this year, as well as their numbers for 2003-08:

    shooting1

    To me, the shooting percentage, that’s just something that comes and goes. While it’s certainly frustrating that those guys aren’t scoring goals and it’s a topic to fill radio time with, the results that these guys have obtained this year aren’t out of line with the streakiness that they’ve demonstrated to date. The really interesting thing to me is that all of those guys are taking shots at well above their career rates. While the “it’s early” caveat obviously applies to that as well, I get less concerned about guys who aren’t scoring when they’re hitting the ball hard. I haven’t seen a ton of Oilers games so far this year, what with going to the ALCS and WS but, from what I’ve seen and what I’ve heard on the radio (even once you apply a discount for the Phillips effect), these guys have had chances – they’re not just generating a ton of shots from the outside.

    I distinctly remember Horcoff hitting a crossbar in the Chicago game, I vaguely recall Hemsky hitting post somewhere and Cole had a breakaway the other night. I’m not particularly well positioned to argue that they’re getting their chances but my sense of it all is that they are. Given that they’re all getting a lot of shots for them, it seems to me that if you want to make the argument that there’s something fundamentally wrong with them other than needing a little affection from the Hockey Gods, you’ve got an uphill climb.

    As you can see from the table above Horcoff is 1/26 in terms of ES shooting this year. Hemsky and Cole are at 0/21 in ES shooting and Gagner is at 0/19. I’ve got the data on their ES shots from 2003-08. Using Horcoff as an example, he has taken 435 ES shots during that time (empty netters excluded). You can view that as 410 26 shot sets, i.e. 1-26, 2-27 etc. If you look at it that way, what he’s going through, while unusual, isn’t unprecedented – for his career, he’s been 1/26 or 0/26 on his most recent 26 ES shots about 16.8% of the time. Cole’s been 0/21 7% of the time. Hemsky’s been 0/21 9.9% of the time.

    shooting2

    All things considered, this doesn’t seem like it’s something getting worked up about. More broadly, it’s kind of funny – you hear about how sports is good for kids because it’s like life and teaches them that hard work is rewarded and all that. If the world was a more cynical place, maybe we’d hear that sports are good because they you that much of life is out of your control – you can work hard and do the right thing but there’s still an awful lot of everything left to fate. Even worse, you can expect to be blamed, even if you work hard and do the right thing and the results aren’t there.

    The most that you can hope for is that when luck smiles on you and you put in 8 of your last 26, you’re coming off a two point night in which you added the shootout winner, rather than people calling it a fluke, they write stories praising you for the time that you spent at a Mexican stick factory in the offseason. You won’t deserve the praise but, hell, maybe it’ll make the unwarranted criticism easier to take.

    (By the way: the Mexican stick factory story is dated December 18, 2007. Since that date, Horcoff has 5 ES goals on 65 ES shots, for a 7.7% ES shooting percentage, which is a poor shooting percentage for a forward and probably particularly one who plays with Hemsky. I’d like to see the local media aggressively pursuing this angle. If the Mexican stick factory visit “fixed” Horcoff, what’s wrong now? You simply CANNOT let these issues slide.)

    About

    39 Responses to Atom balls

    1. slipper
      November 2, 2008 at

      It’s obvious to me that the problem is the Oilers are having trouble visualizing success.

      Do they still have that sports psychologist on payroll?

    2. November 2, 2008 at

      Terrific stuff, Tyler.

      I think that a decent way to guess what sort of streaks should be expected, just by chance, is to use a box of chocolates as an analogy.

      So if Horcoff (53 ESG on 435 ESS) were to be given a box of 435 chocolates, and they all looked identical, but 53 had a delicious Grand Marnier centre, while the rest were hard toffee centred … if Shawn picked 26 chocolates from the box, what are the odds of him not lucking in to even a single Grand Marnier centre? Going ofer.

      The answer: 3.1%

      BTW: You calculate that in Excel using =HYPGEOMDIST(0,26,53,435)

      And your table shows that he’s had 0 for 26 streaks 3.4% of the time.

      And his chances of just one Grand Marnier chocolate? No more and no less. Thats 11.8%

      And from your table, he’s had 1 for 26 streaks 13.4% of the time.

      And so on and so on.

      Nothing here is too shocking to me. These streaks exist because they must, I think. And in the right measure.

      A Monte Carlo simulation would be better, but the chocolates analogy is simpler and works out near as dammit for these guys. And of course chocolates be chocolates, but hockey has players are getting better with experience and getting worse with age, and players are changing roles and linemates and playing through injury, and so on. Still, it’s amazing how well this analogy works for these guys above. And surely for everyone in the league.

      As you would expect, I’m sure. But Jason Gregor and his regular callers aren’t going to embrace this kind of thinking at all.

      Moreau talked about this 5 or 6 years ago. He’d just broken a ten game goal-less drought and the interviewer asked him if was relieved. He said something along the lines of “for me, 15 goals is a good year, so I’ll probably have a couple 10 game streaks without scoring in a year, just the way it goes.” That cat is oddly rational and unflappable.

      I may have the numbers wrong, but it was something like that. And I’m sure he’s right.

    3. mc79hockey
      November 2, 2008 at

      Hemsky and Cole with ES goals in the first period versus Carolina. I KNEW that taking the time to finish this post last night was a good idea.

    4. mc79hockey
      November 2, 2008 at

      Vic -

      I was fiddling around with the BINOMDIST function and this info. Of course, it’s not going to work out exactly, as every one of Horcoff’s shots doesn’t have a .122 chance of going in and he probably has runs where he’s taking higher percentage shots and runs where he’s taking lower percentage shots, but it seemed reasonably accurate to me.

    5. November 2, 2008 at

      Yeah, we all gravitate to that, I certainly do. It’s slipper’s Roulette Wheel, and the reason that Trivia Dice makes a world of sense to MacLean and DMFB (has he retired from blogging btw? I’ve heard rumblings), but to almost everyone else it seems like pure madness.

      The correlation of the roulette wheel (shake them bones!) is .95 for Cole, he was the guy with the longest history, and you put it up in image format and I don’t like transcribing, so that’s all I did. I t will be more or less the same for everyone in the league though, with a similar number of games played and without freaky injuries, or two seasons on Mario’s line, in play.

      The correlation with Poisson (Alan Ryder says the Hockey Gods are throwing lightning bolts at random! Mudcrutch confirms! Run, fuckers, run!”)

      Also .95

      Correlation with “life is just a box of chocolates” is also .95.

      And the dozens of buggers who have come onto the internet and questioned sensible hockey stuff with “I’d like to see a chi square test” … well, they actually finally have an example that lends itslef to that. Props to you for providing it, and assuming that my quick read of the wiki page was correct.

      Of course when the kids say these things they are really saying “that just seems wrong to me, maybe I’ll just throw out some bullshit I don’t really even understand myself and hope it inspires Bruce et al to continue doing the great work that makes sense to me intuitively”.

      Little known fact: The kids think in run on sentences.

      Apply that to your stuff here and the p values are so small that the human brain can’t grasp them. Best to invert.

      And I’m with you, I’m standing between you and PDO making the smart bets at slipper’s wheel of fortune, knowing full well that there’s a cheat in there somewhere, but it’s not so significant that you can find it.

      And the chi square test gives a P value (on the assumption that I’ve done it right) of about 1 million to one, in terms of odds.

      And Poisson … thousands to one is nothing to sniff at.

      And “box of chocolates”, with that air of predestination that makes all the sensible folks around here uncomfortable … 35 million to one on Cole. Damn. Maybe just a one-off, I dunno. But I’ll work on the assumption that MacTavish was raised Lutheran until I’m told otherwise. :)

    6. November 3, 2008 at

      I will be perfectly honest with you, Vic. I have a strong sense of hostility towards you, not because I’m inflexible, but because frankly, the presentation and defence of the ideas you present is such that it engenders hostility in those who aren’t immediately inclined to go along with it. But hey, if the stats I understand agree with the stats you understand, I can go along with it. However, before I completely capitulate on the point, I do think I should make a suggestion that would greatly improve the reception of your work as a whole.

      Show your work.

      The advancement of any sort of research hinges on others’ understanding of the work being done. My stats background is minimal and mostly focused on its application in research labs in the hard sciences; the pure-math element of it is still a bit lost on me, which I can admit now, because I have no stake in any sort of argument. A lot of people don’t even have that much. If you were to explain why you chose a given method, how it works, and why it’s relevant, provide calculations if necessary/feasible, then explain the result, and note the limitations and possible future avenues of research, as you would see in “the real world,” there are a lot of people who would probably be a lot more receptive. I mean, if you’re going to make some pretty counter-intuitive claims, you can’t get mad at people and insult them for disagreeing or not understanding if you don’t fully explain where things come from and why they work, yet that’s precisely what you’ve done, and there’s been a lot of unfortunate shit said that really didn’t have to be because of it.

      It’s a lot of work; I know that. I’ve written a few doorstops in my time, believe me. But I think it would really enhance the reception of the work you do, and I think it would also open your work to more targeted, rational criticism, rather than the general hostility that you currently see, and I can’t see that as anything but good for everyone involved. Like I said, it’s just a suggestion, but you might be surprised how it goes over in the long term.

    7. November 3, 2008 at

      But it’s not counter intuitive, not at all.

      The reason that Tyler ran the binomdist() function in Excel is because he was building a model. Something that isn’t done enough. You can tell by the way he mocks Oiler fandom every time there is a panic over a streak of any sort. Has for years.

      So what he is calculating are the numbers for his model. =BINOMDIST(0,26,53/435,0)*410 will churn out the odds of a dice roller going 0 for 26 when rolling a die that it weighted to land as a “6″ 53 times out of 435 (53/435 is Horcoff’s prior EVshooting%).

      Then times by 410 because that the posssible number streaks the dice roller could have had.

      The dice roller would be expected to average 13.99 of these 0 for 26 streaks BTW. Horcoff has 14 in his career if Tyler’s numbers are right.

      Then =BINOMDIST(1,26,53/435,0)*410 to calculate the expected number of times going 1 for 26 with these dice.

      And so on through to going 8 for 26.

      And you get a list of the expected streaks for the dice roller. And you compare that to the actual for Horcoff, and the pattern is obvious, even though, as mudcrutch says, the model is a bit crude because the dice are consistently weighted, and in hockey sometimes the five bell chances come to you in stretches as well.

      Still, the dice roller and Horcoff are so damn similar, that obviously it is, as mudcrutch suspected, mostly just luck.

      And even though applying the chi-square test is a bit redundant, it applies here.

      Horcoff’s results are the data from the survey (how streaky was the hockey player?) and the predicted results are the ones you’ve just made using excel and following my step by step instructions.

      Mudcrutch thinks that this streakiness is almost entirely just the bounces (hence the name of his post), so he’s predicted the streakiness of Horcoff using a dice roller model.

      =CHITEST(Horcoff’s streaks, the dice rollers streaks)

      Then hit enter and voila, p = 0.000004.

      In your field any p value

    8. November 3, 2008 at

      Just to add.

      The problem here is not the math or the way that myself, Tyler or others build models. The problem is that some people see things in a hockey game completely differently than others.

      What is wildly counter intuitive to almost everyone who calls in to talk radio (or hosts it) or posts on the fan boards is downright obvious to a lot of the folks who talk hockey with each other around here.

      And I don’t have a teaching bent, I have no need to educate the masses. I’d just like to talk hockey with the cool cats on the Oilogosphere.

      Also, the standard of acceptance on the Oilogosphere is NOT the peer review board at the Journal of Embarrassing Hobbies. This works on the BAMBi princple (which is of course, the acronym for Bet Against Me, Person)

    9. slipper
      November 3, 2008 at

      http://timeonice.com/playershots.php?team=COL&first=20001&last=21230

      I really need Colorado to do well this year. How can a team fair so poorly in terms of missed shots yet do somewhat decently everywhere else? Are missed shots like forecasting future trends in the market?

      http://timeonice.com/playershots.php?team=MIN&first=20001&last=21230

      Minny’s results appear much more cut and dry. They such in all three categories: shot differential, fenwick, and missed shot differential.

      http://timeonice.com/playershots.php?team=VAN&first=20001&last=21230

      Vancouver, aswell. In fairness to them, that’s an ugly, ugly sked to begin the season with.

    10. November 4, 2008 at

      Also, the standard of acceptance on the Oilogosphere is NOT the peer review board at the Journal of Embarrassing Hobbies.

      Yeah, because remember the election we all had to put Vic in charge of setting standards for the Oilogosphere?

      Of course, funny thing is, Vic actually did sort of show his work in his original comment. The math on streaks is true in principle IF the distribution of goals over shots is random or close to random; there’s no need to refer back to a data set assuming the premise is true.

    11. mc79hockey
      November 4, 2008 at

      The math on streaks is true in principle IF the distribution of goals over shots is random or close to random; there’s no need to refer back to a data set assuming the premise is true.

      I think you’d have a hard time convincing most hockey fans of that without reference to the data sets (and even then, there will be holdouts).

    12. November 4, 2008 at

      Colby said:

      Yeah, because remember the election we all had to put Vic in charge of setting standards for the Oilogosphere?

      That’s not a rule that I made, that’s an observation.

    13. November 4, 2008 at

      Tyler

      I really like Monte Carlo simulations for this type of thing.

      So going by pure “goals happen when they happen” as a model then, using Excel:

      In column A, rows 1 through 435 (EV shots), enter =RAND()

      In column B, rows 1 through 53 (EV goals) enter the number 1.

      In column C, row 26, enter =SUM(B1:B26), then copy that down through C435.

      Then sort columns A:B by column A.

      Count the number of goalless streaks over 26 games [in cell D1 enter =COUNTIF(C:C,"=0")]

      Count the number of one goal streaks [in cell E1 enter =COUNTIF(C:C,"=1")]

      And so on through 8 goals in 26 shots streaks.

      Then if you write a quick macro to repeat that a thousand times, if you take the average it’s going to be the same as the hypergeometric on (life is a box of chocolates). correlates to the =HYPGEOMDIST() result with r=.99998, that’s close enough for me, you can run more iterations if you want.

      But it wasn’t a total waste of ten minutes. Because while the average of the simulations becomes the hypergeometric (r=.99997 in my test), the spread of the individual simulations has value.

      Horcoff’s “streakiness” correlates to the hypergeometric with r=.9317 if I haven’t made a transcribing error.

      And in this run of 1000 simulations, 558 times Horcoff’s distribution of streaks correlated to the hypergeometric better than the individual simulation did.

      So, Shawn has been ever so slightly more consistent than the guy picking chocolates. But there is so little difference that it doesn’t merit discussion.

      Makes sense Tyler, no?

    14. November 4, 2008 at

      edit for above: the correlation of the average of the sims to hypergeometric was .9999767. I truncated once and rounded once when posting above, though I have no idea why I mentioned it twice.

      Every simulation is going to end up with a nearly exact hypergeometric as an average, because that’s exactly what you’re building by definition.

      A ran another 1000 and the correlation is r= .999915. Again, absurdly high, 100 is plenty to nail down the average it would seem.

      On the other hand, this time Horcoff was MORE streaky than the sims 555 times out of 1000.

    15. November 4, 2008 at

      Sorry to ramble on here, but the point is that from this place you can build a model that introduces an element of “squeezing the stick”, to the model and see how much you have to add to make the results mesh best.

      Of course in this particular case, looking at this one player, and knowing that linemates and shoulder/wrist injuries etc make a difference as well … there is just almost nothing to see outside of the bounces. But in other instances, such as the effect of a defender on shot quality, the effect is going to be small of course, and the quality of opposition will likely be a big part of that small slice of the pie. Still, it seems like a sensible way of trying to quantify something like that.

    16. November 4, 2008 at

      Also Tyler,

      Has anyone done this sort of analysis with hitting in baseball? Intuitively I would think that while most players may well average the expected “batting average with runners in scoring position” (or similar) over their careers, that the good and bad streaks are longer than chance alone can explain for most players.

      I may well be wrong, just my sense of it.

    17. November 4, 2008 at

      Also, the standard of acceptance on the Oilogosphere is NOT the peer review board at the Journal of Embarrassing Hobbies.

      Oh, Vic. Maybe I’m wrong, and you should never change. =) For the record, though, it’s Ultrasound in Medicine and Biology (May 2008 issue, pages 730-740).

      Of course, funny thing is, Vic actually did sort of show his work in his original comment.

      It was more of a general comment on how things work around here than on that particular comment; I understand Vic isn’t particularly interested in teaching, but if that’s the case, I don’t think it’s very fair to get mad and start ridiculing people if they don’t take the lessons that he isn’t teaching to heart. I also think it would advance discussion further if more “talk radio” people (and I assume I’m one of them; whatever gets you hot, I guess) were able to provide concrete, rational feedback, but given my background, no one should be surprised I’m a bit of a whore for peer review, even if it is kind of a pain in the ass. ;)

      I wonder how some of the folks in Neurophysiology here would respond to the idea of the neurological factors that govern hit vs. miss boiling down to a complete random distribution over a sufficiently large sample size. I dunno, maybe they’d be behind it; I’m biomechanics, not neurophys, so I haven’t really taken enough to say.

    18. November 4, 2008 at

      I wonder how some of the folks in Neurophysiology here would respond to the idea of the neurological factors that govern hit vs. miss boiling down to a complete random distribution over a sufficiently large sample size. I dunno, maybe they’d be behind it; I’m biomechanics, not neurophys, so I haven’t really taken enough to say.

      I can. My wife has a PhD in neuoroscience and has 5 publications. There are all sorts of things to take into account such as neurotransmitters and muscle-memory, not to mention concussion or concussion-like symptoms, though I think the stats-based approach just makes the assumption that all professional athletes are reasonably invariable in these things in game day scenarios. On the whole, that may be right, but the neurological factors explain the streaks much better (though obviously we have no useable data to support this).

    19. November 4, 2008 at

      Oh, and just to make it clear that I’m not in any way supporting Vic:

      I’m gonna go out here on a limb and assume your “people don’t understand me because they’re stupid and talk about things they don’t understand” comments (Comment #5)

      Seriously man, I gave a degree in Philosophy, so you can be sure at the very least my logic is pretty damn good. I also have a minor in both English and Mathematics (with a heavy slant towards statistics) so you can be reasonably assured that I both understand English, and understand stats.

      I know it’s the internet and all, but for the love of god, assuming (or arguing) that everyone around you are idiots because they don’t use preset formulae in Excel, or want more rigor in the calculation (Strengths AND Weaknesses) to take them seriously is ridiculous. I mean seriously, one of these days you have to stop belittling people that disagree with you when they know as much (and possibly more) as you do, right?

      Right?

      No?

      Ok.

    20. November 4, 2008 at

      Heh, I may have a minor in English, but the internet makes me not proofread:

      I’m gonna go out here on a limb and assume your “people don’t understand me because they’re stupid and talk about things they don’t understand” comments (Comment #5) include me.

    21. mc79hockey
      November 4, 2008 at

      The comments section here seems to have suddenly turned into Golgotha.

    22. November 4, 2008 at

      @Ender: Could be; certainly, there are psychological factors that can explain streaks (the aforementioned “stick squeezing”), so certain neurological factors would logically also go into it. I was thinking more of the reason why an NHL player can aim top glove and have any of the following happen:

      1) Hit the goalie (i.e. miss inside)
      2) Hit the post/crossbar
      3) Miss outside
      4) Hit the target, with the goalie either saving or not

      There’s something about that being a random distribution over a large sample size that’s both creepy and intriguing. I dunno.

      I think the stats-based approach just makes the assumption that all professional athletes are reasonably invariable in these things in game day scenarios

      That’s certainly the fundamental assumption behind the field of sports psychology: all things considered, the average Olympian or NHLer is going to be in approximately the same physical condition, or at least good enough that it’s not going to be the primary factor in a win/loss, and the key to a successful performance is being focused, alert but not hyperstimulated, and capable of responding to adverse stimuli (being at the back of the pack after the first lap; two quick goals in the first) in a positive fashion.

    23. November 4, 2008 at

      @Tyler: It was never my intention to turn this into a crucifixion; I was just sharing a thought that I figured would help Vic make his points better to some of the people who don’t much care for him.

      (And yes, I had to look it up; for a sec there, I thought you were referring to Ender and I pimping our own accomplishments.)

    24. mc79hockey
      November 4, 2008 at

      Just as an aside – I don’t think anyone’s saying that the hit/miss boils down to a complete random distribution. The way I think of things is that each player has a certain game that he plays. That game tends to result in him taking shots from certain spots on the ice, with a certain likelihood of scoring. There’s a sort of innate level of shooting percentage, say. Maybe for Alexander Ovechkin it’s 12% at ES, for Jason Smith it’s 2%. It’s “random” in a sense but, with a large enough sample, should result in that mean. I’m surprised if this is a controversial idea – I think that it’s pretty widely accepted by the saberists.

    25. November 4, 2008 at

      I don’t think anyone’s really complaining about the distribution. If you have a high enough N, pretty much everything will end up a normal distribution, and that’s fine. I think the only issue I have with comments on blogs, or some blogs themselves, is that a lot of people a) Use a long-term distribution to “prove” a short-term problem (which may or may not work, and I’m perfectly fine with the exercise) and b) Getting amazingly hostile and derogatory to the people who disagree that it proves what the other thinks it proves (which is more where the problem lies).

      I doubt most people around actually think that the numbers they pull out are law, and that to disagree is idiotic. However, it seems that things are occasionally treated that way.

      Basically, I had no issues with Vic’s first comment in this thread. It was informative and relevant. The second, though, is effectively trolling, and I see it often enough over other sites that sooner or later I personally feel the need to say “Whoa guy, wait a sec. I’m not anywhere as stupid and out of touch as you seem to think I am.”

      Apologies for OT.

    26. November 4, 2008 at

      Ender:

      I don’t know if I’ve ever read any comments that you’ve written. Apologies for that, I’m terrible that way. I don’t think that I remembered reading anything from Jonathon on the sphere before I read his blog either, and that wasn’t until June or July of this year, and then only because Dennis gushed about the guy. And it turns out that he’s switched on, so there you go.

      Firstly, I have never, ever, run at anyone unless they were a dink first. Not in real life, and not on the intarweb.

      And in this thread I am criticized by one kid for not presenting any work, then when I do (and in the ONLY format that mudcrutch uses, Excel). Kid number two comes to his defense. It would be funny if it wasn’t so damn sad.

      And an increase in N will not drive everything to a normal distribution. Plot out MCs numbers above and have a think.

      And on the numbers that Dennis is doing, way too early now, but with a larger sample we’ll be looking at a Zipf distribution after any sort of sensible application of context. Do you know why, Ender?

      The harsh reality of this is that some people just aren’t switched on, they are remarkably oblivious to much of what happens in a hockey game. As a consequence, they become offended when they read people who DO have a clue talk about the game. Then they become more offended when their remarks are disregarded. Then they tell us about their degree in philosophy, and are shocked when I don’t care.

      Moreau is switched on, and he could be illiterate for all I know. But he’s a guy worth listening to. And you, well it’s early, but it’s not a good start, Ender.

      Now the Oilogosphere isn’t mine of course, in fact I’d be flattering myself if I placed my importance in the top dozen Oilogospherites. And the next stretch looks to be busy for me, so I won’t be talking with the cool cats very much at all over the next few weeks. Still, if I can help delay the inevitable transformation of this place into HFBoardsDeluxe, by discouraging folks like you from posting. Well, that’s what I will do. And for purely selfish reasons, I’m not doing God’s work.

      I mean Bruce has a blog now, no? Like minds should gather, because I’m just naturally confrontational, and I’m going to guess that both you guys find this sort of thing a bit stressful. And there is no need for it, it’s just the internet after all.

    27. November 4, 2008 at

      “Firstly, I have never, ever, run at anyone unless they were a dink first.”

      Just a quick correction, I’ve triggered myself here. I once sucker punched a guy in the back of the head at a Sass Jordan concert. Funny how the brain works, I’ve been in more than my share of fights as a youngster, but I had no recollection of this until I tuned into Canadian Idol this summer (BTW: it sounds like an idiotic premise, like a big karaoke contest, but it’s oddly compelling, I’d recommend it if they do another season).

      Sass Jordan was one of the judges on the show, and it triggered my memory I guess. I had never remembered every seeing Sass Jordan (much hotter back then BTW, but smaller boobs for sure). Then in an instant I can even remember watching to see what hand he was drinking with, so I knew which way he would turn. He had a brushcut, and nobody had a brushcut in 1989, and I drifted the giant bastage in the back of the head and waited for him to turn, and he never did, he rolled away.

      Just as well, he would have kicked my ass. Dude was the size of one and a half people.

      Damn, I had always prided myself on never hitting guys while they were down. Something that separated me from other people I knew. The biggest fight me and my Dad ever had was on the subject, one of my brothers had seen me not finish a guy in a fight behind the rink when I was 15 or so, the prick told my old man. We never spoke again before he died about a year later.

      No way in hell I relay this story to anyone I know in real life. But the internet is a different place.

      Strange shit, anyways. And I’m not particularly bothered by it, so I have no idea why my memory had blocked it out. Then again I once remembered that I had dated a women with the same name as my daughter, but not until she was two years old (word to the wise, husbands, keep that stuff to yourself). So it’s probably just random.

      Any road, I’ll show myself out. Peace out, guys, see you in a month or so.

    28. November 4, 2008 at

      Vic,

      I know I shouldn’t reply to this, but I will anyway.

      I don’t know if I’ve ever read any comments that you’ve written. Apologies for that, I’m terrible that way. I don’t think that I remembered reading anything from Jonathon on the sphere before I read his blog either, and that wasn’t until June or July of this year, and then only because Dennis gushed about the guy. And it turns out that he’s switched on, so there you go.

      Whether or not you remember it, you’ve attacked my comments more than once.

      Firstly, I have never, ever, run at anyone unless they were a dink first. Not in real life, and not on the intarweb.

      That’s a lie.

      And in this thread I am criticized by one kid for not presenting any work, then when I do (and in the ONLY format that mudcrutch uses, Excel). Kid number two comes to his defense. It would be funny if it wasn’t so damn sad.

      I’m not sure how you don’t see this paragraph as horribly derogatory, and comprised of everything I was complaining about. Regardless, if I’m “Kid B,” I wasn’t coming to Doogie’s defense at all. I was coming to my defense. It seems that “I don’t know if I’ve ever read any comments that you’ve written,” applies to the comment you’re replying to as well. See my reply to your second paragraph for more on that.

      And an increase in N will not drive everything to a normal distribution. Plot out MCs numbers above and have a think.

      I didn’t say “everything”. I said “pretty much everything” and that’s true. Look it up.

      And on the numbers that Dennis is doing, way too early now, but with a larger sample we’ll be looking at a Zipf distribution after any sort of sensible application of context. Do you know why, Ender?

      Look to the replies to the previous two paragraphs for the answer to this one.

      The harsh reality of this is that some people just aren’t switched on, they are remarkably oblivious to much of what happens in a hockey game. As a consequence, they become offended when they read people who DO have a clue talk about the game. Then they become more offended when their remarks are disregarded. Then they tell us about their degree in philosophy, and are shocked when I don’t care.

      And here’s the real meat. If someone has what you call “a clue” that seems to mean that they agree with you. There are two words for that attitude: arrogant and ignorant (different words, different meanings, but both apply. Look it up). And again, you might want to reference my reply to your paragraph 2.

      Moreau is switched on, and he could be illiterate for all I know. But he’s a guy worth listening to. And you, well it’s early, but it’s not a good start, Ender.

      See reply to paragraph 2 – Isn’t it amazing how often I’m pointing to that? Seriously, I should just start saying that for each of these paragraphs you should just add the previous comments. Like a recursive deconstruction.

      Now the Oilogosphere isn’t mine of course, in fact I’d be flattering myself if I placed my importance in the top dozen Oilogospherites. And the next stretch looks to be busy for me, so I won’t be talking with the cool cats very much at all over the next few weeks. Still, if I can help delay the inevitable transformation of this place into HFBoardsDeluxe, by discouraging folks like you from posting. Well, that’s what I will do. And for purely selfish reasons, I’m not doing God’s work.

      Derogatory again, towards everyone who doesn’t agree with you. You talk about how you shouldn’t flatter yourself, but proceed to be both ignorant and arrogant, while professing to do God’s work. Ironically or no, I think you’ve just about hit a critical mass for the arrogance and ignorance.

      I mean Bruce has a blog now, no? Like minds should gather, because I’m just naturally confrontational, and I’m going to guess that both you guys find this sort of thing a bit stressful. And there is no need for it, it’s just the internet after all.

      Exactly. You’d hope that with globalism and the influx of more and more ideas you could have respectful conversations with people. You’d expect that race, religion, socioeconomic background, academic background, and even “Switched-onness” would be able to be set aside in a discussion. I’m not sure which exactly is sadder though. Is it the fact that you don’t do that, or that you’re so very quick to do it.

    29. November 4, 2008 at

      It’s “random” in a sense but, with a large enough sample, should result in that mean. I’m surprised if this is a controversial idea – I think that it’s pretty widely accepted by the saberists.

      Of course, I was referring to the distribution about the mean. Sorry, I didn’t specify that.

    30. November 4, 2008 at

      Ender

      Now, now. We both know that someone like me is not going to read a post that long from someone like you.

      If you wish to be blessed with my acknowledgment, you’ll simply have to pare that down.

      Get to work on that, ASAP.

      Peace out,

      Vic

    31. November 6, 2008 at

      I think you’d have a hard time convincing most hockey fans of that without reference to the data sets (and even then, there will be holdouts).

      But you don’t actually have to show that all NHL players have random goal-over-shot distributions, either; it’s enough to make the point if you show that Horcoff’s dry spell, or whomever, would be consistent with a random distribution.

    32. November 6, 2008 at

      I think you’d have a hard time convincing most hockey fans of that without reference to the data sets (and even then, there will be holdouts).

      But you don’t actually have to show that all NHL players have random goal-over-shot distributions, either; it’s enough to make the point if you show that Horcoff’s dry spell, or whomever, would be consistent with a random distribution.

      I’m probably missing something, but wouldn’t you, then, only be showing that Horcoff’s streaks are random with an overall mean of points? Or is that what you’re trying to do, pick certain players rather than looking at the whole?

      That makes sense, and I think most people would buy that. I’m just not sure how you could “show” that without referencing the data sets.

    33. November 6, 2008 at

      Colby:

      Yeah, though you could make the argument that it’s just coincidence that Horcoff has been exactly as streaky as chance alone would dictate he should be. At least by the ‘n’ that Tyler has chosen (26).

      We could revisit this in a few years, and unless Shawn has some sort of devastating wrist injury or something, the shape of the curve will be the same I’m sure (it will NOT gravitate to a normal distribution, ever, of course. Because the shooting% determines the ‘shape’ of the distribution.)

      There will be some players more streaky than the random distribution though. I would be fairly sure that Chris Simon would be a good example. The guy never had a lot of finish, but everyone who played with Oates in the day surely saw their shooting% go way up. And he had one long stretch of that, and another stretch playing with Yashin on Long Island iirc.

      Rob Brown (Mario effect) is another. Maybe Krushelnyski, I wasn’t an Oiler fan then but I remember him playing for the better part of one year with Gretzky, no? If he did he will have much longer scoring droughts than expected given his career EVshooting%. And far more hot streaks.

      Personally, I’m convinced. Then again I would have been shocked if it had been otherwise, so I’m easier to sell on the principle.

      If Tyler had a bunch of time to kill, he could run a bunch of players, and then plot them all out against the streaks expected through random chance, with a bunch of little bar charts. I think that all people see numbers better when they are visualized linearly, especially a distribution.

      Then again, I think Tyler is convinced, and he certain doesn’t owe us a thesis on the subject. He’s given us a link to the data before on here. If somebody doubts the conclusion, then they should be encouraged to invest some of their time and analyze even more players, and with a rational and unbiased methodology.

    34. November 6, 2008 at

      “Under certain conditions (such as being independent and identically-distributed with finite variance), the sum of a large number of random variables is approximately normally distributed — this is the central limit theorem.”

      Wikipedia

      Just sayin’.

    35. November 6, 2008 at

      I guess I should qualify that last statement. If you look at MC79′s original table, it appears that Horcoff’s shooting percentage roughly makes a normal curve. There isn’t quite enough data there to call it a Normal Distribution, but think of what he’s doing in another way.

      Shooting percentage can be considered as the amount of “hits” vs “misses.” It’s the classic coin flip example. It’s a weighted coin, but a coin nonetheless. You can graph it as “amount of shots between goals” if you like. If the numbers work out to be reasonably random (like your work in comment #2 shows), we do end up with a Normal Distribution – possibly not over a season, but definitely over a career. This makes intuitive sense. If a person averages 1/Y SH%, that effectively means that he’s likely to score every Y shots or so (obviously that doesn’t mean it’s perfect. We’d actually be looking to the mean — as a sidenote, the standard deviation of this particular normal curve would tell us how streaky the player is). But what we can say, using a Chi-squared test (which you seem to really dislike, Vic) is whether or not a player scoring more than normal or less than normal is significant, or just within the limits of the mathematical model.

      This actually agrees with both MC79′s original post, and your original comment Vic. I’m not sure why you’re content to throw it out, unless you’re trying to argue something that isn’t coming across in your comments.

    36. November 7, 2008 at

      No Ender, that’s not what he is doing at all. And it will NEVER approach a normal distribution. This is why a chi square test applied (as well as having a small number of nice, fat integers). Otherwise a t-test would be better.

      ‘n’ in this case is 26, the length of the streak. That won’t change even if Horcoff plays until he’s 45 years old. In order for Tyler’s distribution to approach a normal distribution, the NHL EVshooting%’s would have to approach 0. This won’t happen, even though some games it feels like it might.

      If you had been talking about “EVsave% behind skaters”, as I looked at here: http://vhockey.blogspot.com/2007/12/shit-happens.html
      If you read this, I explain explicitly the methodology, and the data is all publicly available btw.

      Then although we are near normally distributed early in the season, this isn’t the old days, we all have bitchin’ computers now, so we can stay with the model and let the computer make trillions of calculations. That way we never lose sight of what we are measuring, and how.

      BTW: That post is dated December 7th of last year. And I predicted the distribution for the same players at the end of the year, based on the shit-happens principle.

      The fact that both the actual Dec.7 results, my modelled Dec.7 results and my modelled end of season results are all essentially normal distributions is immaterial though. The fact that the variance for both the actual and modelled changes to the same degree with the sample size, that’s why it’s impressive.

      Look up the data and plot the actual for last year, it will be right on top of that yellow line I predicted. Because it has to be, it’s just hockey after all.

      In summary:

      1) Tyler’s data and Tyler’s model are NOT normally distributed, and never will become “more normal” regardless of the length of a players career.

      2) While I admit to only reading a small portion of what you and Doogie have written above, if you guys have decided that large samples make eveything appear random (normal distribution!) then you are missing the point in a way I wouldn’t have even thought possible.

      In many cases, (depending on what me, Tyler, Matt or others are looking at, and the model we’re using) the results are going to be normally distributed. The spread of results is what matters to predictive value and repeatability, the variance.

    37. November 7, 2008 at

      While I admit to only reading a small portion of what you and Doogie have written above, if you guys have decided that large samples make eveything appear random (normal distribution!) then you are missing the point in a way I wouldn’t have even thought possible.

      Maybe you should read it then, because as far as I can see either your contradicting yourself or agreeing with me. I mean, you even go as far as to say “The fact that both the actual Dec.7 results, my modelled Dec.7 results and my modelled end of season results are all essentially normal distributions is immaterial though.” Data does normalize in a binomial situation (which is what we have here) over time. Period. You’re saying that Tyler’s n won’t change, because he’s fixed it. OK. I’m saying if you increase the n, you’ll be able to do more with the data. I mean, hell, you’re even saying that you care more about the predictive value (which is what the Chi-squared will give you) and the variance (or standard deviation, which I also mentioned earlier as getting you the data you’re really looking for.

      I mean, you’re arguing that I”m an idiot while agreeing with everything I say.

    38. November 7, 2008 at

      As a sidenote, a T-test and a Chi-squared test will give you two different things. The T-test would likely be more useful in saying whether or not a streak is a significant departure from the norm, while a Chi-squared test would be useful in comparing multiple years for a player, or comparing two players. For example, it would be able to tell us whether Horcoff’s scoring spree last year was a significant departure (and indicative of “improving” vs just being a lucky year at the high end of normal for him).

    39. November 8, 2008 at

      Ender:

      By increasing the ‘n’ here, you mean increasing the length of the streak?

      Because Horcoff could play for a million years and the ‘n’ won’t change. And this distribution will never become ‘normalized’ for that reason.

      Also, the link to the save%-behind-a-skater thing was meant to show the contrast, that IS essentially a normal distribution at that point. That’s why it’s a completely different kettle of fish.

      So you can subtract the variances there, the save% thing (observed variance minus that predicted with my model). Then take the square root of that and divide it by the standard deviation of the observed near-as-dammit-normal distribution … and you have your estimate of the percentage of EVsave% that is driven by the differing abilities of NHL skaters in this regard.

      And that’s a small number, and that’s the main reason that nobody can list the NHL skaters who will have the best EVsave% behind them (relative to teammates) for the rest of this season. If you can pick 15 guys, and have them, collectively, see an extra 3 or 4 pucks stopped per 1000 shots due to their presence on the ice … well that’s impressive, and about as anyone can be expected to do.

    Leave a Reply

    Your email address will not be published. Required fields are marked *