Weekly Playoff Probabilities: Fairy Tales, Fenwick, and the Accuracy Paradox

“Mirror, Mirror on the wall, who is the fairest of them all?” In the tale of Snow White, first collected by the Brother’s Grimm in 1812, the evil step mother of Snow White asks the mirror time and again to which it always replies “You my Queen, are fairest of all.” Scorned by the sands of Feburary, we in modern era find ourselves asking the same question. reaching out in desperation, “Fenwick! Fenwick! Who is the fairest of them all?” “You my Sharks,” the mirror replies, “you are one of the fairest of them all.”

But on the day that Snow White reached 17, the mirror no longer refrained the soothing phrase, replying to the queen, “Queen, you are full fair, ’tis true, but Snow White is fairer than you.” In a rage of jealousy the queen orders for the lungs and liver of Snow White. With knife in hand prepared to take life, the huntsmen ordered by the queen falls love struck by Snow White’s beauty. And instead alerts her to her step mother’s evil intent. Lost within the forest, Snow White collapses only to be found by a group of dwarfsmen that pity her feeble condition. But for the troubles of February, we like Snow White have fallen to waste in the forest. Lost, breatheless, without the same hope for playoff home ice advantage enjoyed of prior All-Star break year. We seek sustenance, through which we may only find solace in the cottage of a winning streak.

Thrice times the queen finds Snow White to finish her deed. Once as a peddler and another as an old maid. Twice the dwarfs are able to revive our dear Snow White. Alas, in what is no doubt the forbidden fruit of the Garden of Eden, the masked step mother of Snow White offers a poisoned apple, a temptation for which Snow White cannot refuse. She collapses to the ground in a stupor, just as her innocence, the innocence of her youth, likewise escapes her. Time passes without motion from Snow White, she is trapped in a world that has no existence- the passage from the life of youth, to the burden of womenhood, with which comes the knowledge of life, love, and sorrow. Here we have found Advanced hockey statistics that bring knowledge, but with it danger, for we know not its full power. With time truth will be forced to lay down her hand, but now is not that time, and we abide as just as Snow White, encased in our coffin of uncertainty.

At last a prince enchanted by Snow White’s beauty, free’s her from the glass coffin which binds her through a kiss. The gesture that fulfill’s her journey. As we move on from the innocence of PDO, forget not life. For it is not through the relief of victory that we achieve deliverance, but through the battle together. We breathe, We anguish, We laugh, We believe, and therefore We know we are alive!

All artistic liberties and analogies aside, this week we will explore the concept of predictivity, but first, playoff probabilities.

Western Predicted Final Standings

Final Standings	Team	Score Adj Fenwick%	Playoff Probability	Mean Points	Mean Wins	Mean Ties	Mean Losses	Change in PP
1	Detroit Red Wings	54.54	100.0%	110.78	52.87	5.05	24.09	0.0%
2	Vancouver Canucks	51.79	100.0%	109.33	49.69	9.96	22.36	0.0%
3	San Jose Sharks	52.59	81.8%	94.90	42.85	9.20	29.95	-15.6%
4	St. Louis Blues	53.69	100.0%	109.12	50.09	8.95	22.97	0.0%
5	Nashville Predators	46.96	97.4%	99.50	45.22	9.05	27.73	13.3%
6	Chicago Blackhawks	52.10	88.3%	95.92	43.48	8.96	29.56	-5.4%
7	Phoenix Coyotes	49.44	72.6%	93.73	41.32	11.09	29.59	3.9%
8	Dallas Stars	50.45	60.9%	92.35	42.63	7.09	32.28	36.0%
9	Los Angeles Kings	51.33	54.9%	91.60	38.75	14.09	29.15	-5.3%
10	Colorado Avalanche	50.08	26.2%	88.71	41.37	5.97	34.66	1.3%
11	Calgary Flames	47.30	13.5%	86.65	36.78	13.09	32.13	-19.2%
12	Anaheim Ducks	48.92	2.0%	82.91	35.48	11.95	34.57	-5.4%
13	Minnesota Wild	45.93	2.4%	82.80	35.37	12.06	34.57	-3.5%
14	Edmonton Oilers	48.26	0.0%	75.33	33.56	8.21	40.23	-0.1%
15	Columbus Blue Jackets	48.11	0.0%	64.38	27.66	9.07	45.28	0.0%

Eastern Predicted Final Standings

Final Standings	Team	Score Adj Fenwick%	Playoff Probability	Mean Points	Mean Wins	Mean Ties	Mean Losses	Change in PP
1	New York Rangers	49.51	100.0%	109.59	50.13	9.34	22.54	0.02%
2	Boston Bruins	52.65	99.8%	102.11	48.39	5.33	28.28	0.29%
3	Florida Panthers	49.60	73.1%	92.04	38.91	14.22	28.87	6.29%
4	Pittsburgh Penguins	53.83	100.0%	103.59	48.20	7.18	26.62	0.82%
5	Philadelphia Flyers	51.64	98.6%	99.14	44.90	9.33	27.77	1.27%
6	New Jersey Devils	50.49	96.7%	97.25	45.01	7.23	29.76	-0.50%
7	Ottawa Senators	49.91	86.6%	94.00	42.01	9.98	30.01	12.19%
8	Winnipeg Jets	50.68	41.9%	88.74	39.38	9.97	32.64	-0.37%
9	Washington Capitals	49.03	41.6%	88.52	40.65	7.21	34.14	-1.93%
10	Buffalo Sabres	48.65	21.3%	86.34	38.12	10.10	33.78	16.81%
11	Tampa Bay Lightning	48.57	20.3%	86.18	39.04	8.10	34.86	9.58%
12	Toronto Maple Leafs	48.98	16.2%	85.41	38.16	9.09	34.75	-26.49%
13	New York Islanders	48.66	2.1%	81.05	34.98	11.10	35.93	-9.16%
14	Carolina Hurricanes	48.73	1.7%	80.43	32.17	16.09	33.74	-1.22%
15	Montreal Canadiens	48.72	0.2%	77.63	32.83	11.98	37.19	-7.61%

The Last column represents a change in playoff probability from 2/20/12. Full discussion of the model here. Full probabilities can be found here.

This week BReynolds broke his silence regarding the Minnesota Wild’s season. Kudos to him for standing up for what he believes. “Stat Guys” are as guilty as everyone in cherry picking examples to showcase advanced statistics. While I don’t believe that anyone who provided the primary research on the Wild’s early success intended any harm to the Wild fanbase, the argument clearly got heated, and way out of hand. Now people are still dragging Minnesota through the mud, and it’s just not needed. We all know what happened, don’t be guilty of cherry picking 1 example. For every story of a team playing outside of their Fenwick, there’s a team not playing up to it.

Reynolds goes on to say something poignont, which I think bears repeating,

Another way to dismiss an argument is to say that the people making it are biased. I admit that bias. When the “stats guys” claim they have no bias, they are lying to you, and to themselves. They are trying to protect their religion, and will fight anyone to do it. They all use the same links, the same posts as their basis of fact, but no one has ever bothered to consider the basics might just be fallible.

…the article uses the example of a ball being thrown up in the air will always return to earth, no matter the arm a person throwing it has. This is to draw an analogy to gravity, and show how regression is a guarantee, just like the ball coming back to earth.

The problem here? The stats are not a certainty. Gravity is. No matter what type of Star Trekian spaceship you build, gravity can never be escaped. Ever. Scientifically proven. There is a reason they are called “laws” of physics and not “suggestions” of physics.

Stats can be overcome. The Bad News Bears can win. The Mighty Casey can strike out. People beat cancer when the numbers say they don’t have a chance. Stats are overcome every single day. Matt Kassian can score two goals in a single NHL game.

He’s right. There exists an immense amount of gray area in statistics that gets largely ignored. There is no guaranteeing regression to the mean, we only expect to observe it over a large sample of data. Therein lies his point. We don’t watch hockey on large season long samples, we watch them game by game, when the butterfly is in full effect.

This brings me to the stat part of my article. Were going to talk a bit about predicitivty, the accuracy paradox, and why you can’t just sledgehammer any piece of data with a least squares regression. As is pointed out elegantly by Burke, prediction is predicated on 2 factors, the second of which is often forgotten. If we want to find the best predictor of future success (often defined as Point%, or expected points in the NHL), we not only have to take the correlation of that variable with wins (or point%) which we may define as “validity”, but we also need to know the reliability of that variable. This concept is best understood with the following graphic

As you can see something that is reliable (lower left, lower right) can be repeated, even if its not perfectly on target. In essence we assess reliability by how close the darts are to eachother. Validty is synonymous with accuracy. We take how far each dart is from the center, and see on average how close we are. As you can see from the top right panel, something can be accurate without being reliable because the average of those darts comes out to the same result as the dart board in the bottom right.

Let’s take that concept and apply it to our desire to predict future hockey wins. Remember that before we can apply a regression, we need to know our 2 factors. That is how reliable a stat is (ie. if we measure it again over the same sample, will it return the same result?), and validity (ie. does it predict winning?). We can summarize this in the equation

predictivity = r(self)*r(win)

Let’s first take a look at validity. I compiled data from the 2007 through 2010 seasons, analyzing each team’s game by game 5v5 tied fenwick%, 5v5 tied goal%, and Point%. I then ran a regression with each variable over each game in the season, to see how each point in the season correlated with all remaining games on the schedule. Here’s what it looks like all excely.

You might be scratching your head wondering, wait? Isn’t Fenwick more predictive? Ah Ha! Yes. But as you can see, if we ran a simple regression of these variables, it looks like Point% pretty much hangs with Fenwick damn near the whole season, overtaking fenwick at about game 60. The only window being around game 25-50 where Fenwick is more predictive. Herein lies the accuracy paradox. So let’s return to our original purpose, and look deeper.

Here is the graph for reliability. For the first 41 games we can perform a test called the “split-half reliability” to determine our reliability [ie. r(self)] value for each variable. After that there are a variety of tricks to get at reliability, I used a combination of Tom Tango’s equation and the binomial equation to estimate reliability for all games following game 41.

Now were getting somewhere. We can see that Fenwick is much more reliable than goals and points. This is largely because there are so many more Fenwick events (goals, shots, and misses) per game than goals alone. Points is more reliable than goals here because its somewhat of a summary stat. It incorporates many stats together, thus increasing it’s reliability. Also note that I’m using 5v5 fenwick tied, which is a very limited Fenwick sample. Using Score-Adjusted Fenwick would in all likelihood by higher that the blue line.

I assume you know where I’m going with this now. Let’s atlast look at r(self)*r(win) to see our predictivity factor projected over a full season. The graph below represents how well a variable at any point in the season predicts the remaining games

Fenwick’s predicitng ability is not so much derived from it’s correlation with winning, as much as it is the sheer amount of information it provides us. From the graph we see that we can’t really predict shit until about the 20 game mark, Fenwick is clearly superior to other modalities for a large stretch until about game 60 when Point% catches up, and then they all fall due to drop in number (ie. small sample) of remaining games.

Lastly, I want to get back to BReynolds point about stats. As you can see here, its not like there is Fenwick and there is nothing else. Point% gives reasonable predictive power especially over the 40-60 game mark. Clearly there is a continuum based on games played, and games remaining that dictate the overall predictive power of any stat. I know the MIN example showcases the extremes (high Point%, low fenwick), but to cherry pick one example for the whole season isnt doing justice to the nature of prediction itself. Its inherently dirty and messy. Our ability to predict future winning is like trying to drive a car by looking through the rear-view mirror. It’s far from perfect.

Western Predicted Final Standings

Eastern Predicted Final Standings

Talking Points