I never met a standard deviation I didn't like.
It’s good to be back.
As news broke in the early hours of Jan 6th, reports began to surface of a shortened 48 game schedule. As we all know, odd bounces and lucky/unlucky streaks have an increased influence with small sample sizes, rendering returns for the truly talented diminished. Less games results in less time for the cream to rise to the top. The average points-per-game over the last 7 years has been 1.117, which means that the best prediction for points needed to make the playoffs is (1.117 x 48 =) 53.6, or 54 points. That’s 38 fewer points needed than a full regular season, which on average requires around 92 to make the playoffs. The importance of every game increases. Each game accounts for (1/48 =) 2.1% of total points instead of (1/82 =) 1.2%, in other words, almost twice as important. As I’m sure you’ll be sick of hearing by the end of the first week. Every point will count, from the very beginning of the season.
Traveling through the most important determinates of winning, PDO and Fenwick%, we’ll take an analytic look into what that may actually mean for playoff chances. For the rest of this article I’ll be referring to PDO as either PDO (Sh% + Sv%) or PDOFen (were Sh% and Sv% are goals per shots plus missed shots, EV, no empty net data). For new visitors to the site, (and I can’t actually imagine there being one) PDO is by far the most important stat to understand in hockey. Here is the best description of PDO written to date. I spent much of my lockout time working on this post over at NHLN, which goes into great detail about the importance of PDO. We’ll compare the variation expected in PDOFEN, and the potential variation in points, as predicted by PDOFEN and Fenwick%.
Let’s first look at the variation expected in PDOFEN by simple chance alone (It turns out that PDO is mostly chance, with very little attributable skill). The best way to explain how we figure out the distribution of chance is through an example;
Let’s say you load a sock drawer with 10 socks, 9 whites and 1 black. Mathematically, we would assume that the chance of pulling a black sock is 1/10 or 10%, a pretty unlikely event. However, sometimes we would pull a black sock. If we tried this about 30 times (call it 1 iteration) we expect to pull about 3 socks on average. However, life is random, and on some iterations we pull the black sock 4 or 5 times, sometimes none. We could plot the number of black socks pulled for each iteration. What we would eventually find after thousands of iterations is a mound looking-thing called a normal distribution. This distribution shows us how likely we are to pull the average (3 socks) or the unlikely 0. This has been worked out mathematically, called the normal approximation interval. It tells us the standard deviation (the spread of the distribution) for a proportion (an independent event that has only 2 outcomes; eg. success or failure).We can use the normal approximation interval to determine the distribution for the "chance" part of PDOFEN.
The figure above shows the distribution of PDOFEN attributed to chance for 48 games and 82 games. As you can see the 48-games sample is much wider than the 82 game sample. In fact it’s 23% bigger. That variation is wide enough to swing a few more teams farther apart, due to chance alone, than the 82 game distribution.
So what does that all mean in terms of points?
Despite being at the mercy of chance, PDOhas a tremendous impact on the outcome of games. We can peak into a single game, and see the effects of PDO, and Shot% on expected points (similar to win probability, but instead of loss/win, its calculated based on standings-points expected given a PDOand Shot% performance for a league average team).
Shot%: Shots For / (Shots For + Shots Against), All Strength data
The graph above plots All-strength PDO vs. Expected Points for a single game (excluding games ending in OT/SO), broken into 3 trend lines depicting the influence of Shot%. As you can see Shot% is a modifier that does have an impact, but not nearly as much as the near completely random PDO. In a bit of irony both have a SD of 0.08 over a single game.
Multivariate Linear Regression Equations for Even-Strength, Excluding Empty Net Data
PDO-FEN: PDO with the inclusion of missed shots. SE: Standard Error of the model. The 82 game model home variable had few outliers pulling the coefficient out of it's normal distribution.
The table above is the output of our multivariate linear regression of PDOFEN and Fenwick% to Standings Points-Per-Game. It's very important to remember that this is explanatory and not to be confused with prediction. Here we see correlation. If we used the above variables to predict future points, PDO's importance would drop to nearly nothing. I have created 10 models, each created with a randomly selected games. As the number of games increases, the coefficient for Fenwick% increases. This indicates that Fenwick% becomes increasingly important as more games are played. This is akin to being able to slightly load the dice. And, consequently the model improves as games increase.
Moving forward, we can apply the above to calculate the variance in points expected from both the talent and chance portions of each variable (PDOFEN, Fenwick%, PDO, and Shot%). This gives us an idea of how chance and talent will influence the final standings.
|Standard deviation||48 games||82 games|
|EV, no empty net PDO-FEN (SD_Chance)||4.79||6.45|
|EV, no empty net PDO-FEN (SD_Talent)||2.62||3.05|
|EV, no empty net Fenwick% (SD_Chance)||1.44||1.86|
|EV, no empty net Fenwick% (SD_Talent)||4.73||8.01|
|All strength PDO (SD_Chance)||5.39||6.83|
|All strength PDO (SD_Talent)||4.59||7.48|
|All Strength Shot% (SD_Chance)||1.94||2.47|
|All Strength Shot% (SD_Talent)||4.93||8.40|
|Total Standings Points (SD_Chance)||6.50||8.40|
|Total Standing Points (SD_Talent)||4.89||8.08|
SD_Chance is the value of 1 standard deviation accounted for by chance. SD_Talent is the value of 1 standard deviation accounted for by Talent. This is derived form the formula; Var(Talent) = Var(Observed) - Var(Chance)
PDO is all strength PDO, whereas PDO-FEN is EV, excluding empty net data, while including missed shots. Shot% is also all-strength, whereas Fenwick% is EV, excluding empty net data.
I’ve converted all the variances into year-end standings points. Each data point represents 1SD, or an approximation of the points spread to be expected from each variable. Interpretation of the data may help with an example;
If we start with a completely league average team, we expect 54 points at the end of the 48 game schedule. That same league average team is subject to odd bounces, streaks etc. so that their PDOFEN will fluctuate causing a change in the year-end standings points of ± 5 points, and ±2 points due to random fluctuations in Fenwick%. Theoretically, if we asked this team to play out 100+ seasons of 48 games, we expect their year-end standings to fluctuate between 49 and 59 from PDOFEN randomness alone, and between 52 and 56 from random Fenwick% fluctuation. That's the catch. Over an 82 game season we only have to worry about the random fluctuation of a small number of variables. Over a shorter season, many variables in which we routinely ignore random fluctuation (due to the large sample sizes), now need attention. Case in point, Fenwick% will randomly fluctuate enough over the 48 game season to cause some teams problems. A stat normally used to generate the most reliable of power rankings.
Alternatively, assuming all the bounces even out for our example team, and we improve their PDOFEN performance to 1SD above the mean, we expect a 2.62 point gain, or a season ending with about 57 points. If we improve their Fenwick% 1 SD above the mean (due to talent alone = 0.527) then we expect a 4.73 point gain, and a year-end standings point total of 59 points. As you can see, the value of being a team 1 SD (from talent alone) above the mean in Fenwick% is nearly identical to the value of having a lucky PDOFEN. That makes the 48 game schedules really precarious.
If however we run the same analysis for the usual 82 game season we come up with; ± 6.45 points due to random PDOFEN fluctuation, and ± 1.86 due to random Fenwick% fluctuation. Bumping that league average’s PDOFEN to 1 SD above the mean (from talent alone) generates a measly 3.05 standings points, whereas 1 SD above the mean (from talent alone) in Fenwick% generates 8.01 points.
Approximate distribution of standings points accounted for by chance for a league average team 2/3 of the time (1SD) the hypothetical league average team will fall in the blue area. 19/20 times in the blue and yellow (2SD).
The figure above is a graphical representation of the table above, showing the distributions of points accounted for by chance for both the 48 game schedule and the 82 game schedules. What's interesting is that the 48 game sample has a tighter distribution than the 82 game, when looking strictly at points (and not Points/game). However, as mentioned above, the 48 game distribution for talent (not shown above) is much less influential than the 82 game schedule.
Lastly, what we can conclude from everything above is that teams will be unable to separate out by talent, as there simply isn't enough games to do so, and the spread in points will be heavily influenced by chance. In addition, the spread will be much tighter than an 82 game schedule, resulting in a mad fight for the playoffs.
As a caveat, it's impossible to truly separate a team’s standing points into these categories; we can only make educated guesses based on our viewing experience. But you can clearly see that an 82 game season is superior, not just because it provides more entertainment, but also because it allows truly talented teams to rise to the top.
 Interestingly, if we allowed both to fluctuate our expected distribution (1SD) is the square root of the sum of the variances, (sqrt[5.39^2+1.94^2]=) ±5.72 points, or between 48.3 and 58.7 points from the combined randomness of PDOFEN and Fenwick%. The output for Pts(SD_Chance) is 6.5, which is pretty close. Obviously the model isn't perfect, but it's nice to see some agreement there.