Ok. I did some stuff and now I'm going to attempt to make a coherent claim out of some of it. Pointing out obvious and maybe not so obvious mistakes: more than welcome.
The question. How much confidence do we have in the claim that some goalies are truly consistent, while others are truly inconsistent? Is something called "consistency" a characteristic of a goalie?
Holiday park, in "your goalies will vary", last tackled this by suggesting looking at variance of SV%. While HP went somewhere else with standard errors and a t-test, I'm going to stay descriptive for the moment. As Snark suggested, I think that we can substantiate the claim that consistency is measurable by virtue of a repeatable standard deviation of a goalie's SV% over a certain number of games.
Put another way. Imagine goalie X. Every game X plays produces a SV% (caveat: which is an incomplete and imperfect indicator of goalie talent and performance, but it's what we got right now). Imagine the spread, distribution, or range of values this produces over time. (I know, right?)
We want to know whether this distribution is a characteristic of goalie X. We can substantiate this with evidence that the distribution is repeatable, non-random. Potentially, if we see a large distribution of SV% values game-to-game, and find this relatively repeatable, then we may have found an inconsistent goalie. Conversely, if we we observe a narrow distribution of SV% values game-to-game, and find this relatively repeatable, then we may have found a consistent goalie.
In a previous post, I had a brief glance at the 2012-2013 seasons of two goalies: Antti Niemi and Jonathan Quick. By looking at the game logs, it was straight-forward to calculate the standard deviation of the game-to-game SV% for both.
Niemi = 0.049
Quick = 0.109
This suggests that Niemi was a consistent goaltender, and Quick was not. But better evidence could be gained if one started to draw random samples from the 2012-2013 season to see if the standard deviation would hold in these samples. Put it this way: if consistency really is characteristic of the goalie, our samples should reflect the population. How close do random samples conform to the spread of the data as a whole?
1) Formatted the game log I already had, giving each game a number ID.
2) I decided to start off using five (5) game sample sets.
3) Generated 100 random sets of 5 numbers (I used www.randomizer.org to generate them; their form is easy to use and results downloadable). Linked these to the game log via VLOOKUP, and presto, 100 random sets of 5 SV%s. (Note: I decided not to have repeat games. I don't think this matters, but FYI for the stat heads.)
4) Fuck it. Made another 100 random sets of 5 numbers. Just in case.
5) Wtf, why not? Generated 100 random sets of 20 numbers.
I first looked at the spread of the 5 game samples. Each 5 game sample produced a standard deviation value. What were the minimum and maximums of the standard deviations drawn from each 5 game sample?
Niemi first 100 five-game samples = 0.0138 - 0.0895
Niemi second 100 five-game samples = 0.0155 - 0.0965
Niemi 100 twenty-game samples 0.0308 - 0.0651
Here, we get a sense of the range of deviation for Niemi (recall that the actual standard deviation was 0.049). And for Quick (actual deviation was 0.109):
Quick first 100 five-game samples = 0.0095 - 0.268
Quick second 100 five-game samples = 0.0117 - 0.2721
Quick 100 twenty-game samples = 0.0383 - 0.143
Obviously the 20-game samples produce a tighter concentration around the actual deviation. That should happen given the sample size. Notice also that the spread for Quick was much greater than Niemi.
For Niemi, in the first set of 100 five-game samples, the standard deviation for game-to-game SV% was 0.045. In the second set of 100 five-game samples, the standard deviation was 0.046. In the set of 100 twenty-game samples, the standard deviation was 0.049. (Breaking down the 100 samples further into sets of 10 discloses a pretty tight concentration around 0.049.)
For Quick, it's a different story. First set of 100 five-game samples = 0.9145. Second set = 0.0808. Twenty-game samples = 0.1018. These are relatively far from the actual deviation of 0.109. Here you go. Tables.
|niemi||2012-2013 (rs)||5 game (ss)||5 game2 (ss)||20 game (ss)|
|gtg mean SV%||0.922||0.923||0.919||0.922|
|gtg mean SV% stdev||0.049||0.045||0.046||0.049|
Table 1. Niemi results.
|quick||2012-2013 (rs)||5 game (ss)||5 game2 (ss)||20 game (ss)|
|gtg mean SV%||0.883||0.876898||0.885922||0.8821935|
|gtg mean SV% stdev||0.109||0.091451503||0.080837331||0.101756477|
Table 2. Quick results.
Now, my interpretation of this is that this is weird, and maybe really problematic. If a goalie is inconsistent, his inconsistency should be repeatable.
Consider the set of 100 twenty-game samples I created for Quick. Each sample consisted of 20 random games from the 2012-2013 season. If he's inconsistent, a random half-season of games should give a result that reflects this inconsistency (measured by a standard deviation). I did this 100 times to make sure that this was repeatable. After 100 games, the deviation was still off by about 0.007. That may seem insignificant, but for Niemi, the 20-game samples hit his deviation right on the nose. It could mean that inconsistency is harder to identify than consistency, but I don't about that.
I could keep going by grabbing more goalies from 2012-2013 (and then grabbing more years of data). But am I reading this properly? Did I produce a "repeatable" deviation result with Niemi? If so, it appeared to work less well with Quick. I'd be really interested to hear your diagnoses. More sets of 100 twenty-game samples for Quick? Increase the sample size? I figured 100 would be good enough, but I don't know the burgeoning conventions for hockey statistic