Mckayla is not impressed
We're not shy about our stance on possession. While it may not be everything, it's pretty damn close. Possession is critical to success, but it's also a basket term. It refers to the ability to drive play forward, but there are a multitude of scenarios in which this can occur. Perhaps it's an ability to maintain possession in the offensive zone, initiate a successful breakout, or key to the defensive of the blueline. Coaches and players spend hours and hours in the video room going over all of these specific game scenarios, yet currently we can only group them together into 1 loosely defined metric; Corsi (the total of all goals, shots, missed shots, and blocked shots for a team, minus it's opponents) or it's slightly more predictive cousin, Fenwick. At least to my knowledge, no one has conclusively proved that maintaining posession in the offensive zone is more important than than killing a breakout, or any other of these possible scenarios.
Outside of what is the current gold standard of tracking zone entries, we may be able to use surrogates to estimate zone specific talent. During last year's playoffs I suggested using Corsi% following a defensive zone faceoff win as a way to look at neutral zone play. Those results were interesting, but were far too few games to really show us anything (And as you'll see below, probably wasn't the best choice). Over the summer I've had the chance to compile Corsi events following every possible faceoff scenario before any player has changed (to eliminate dump and change noise). My hope with parsing this data is to get a better understanding of the importance of defensive, neutral, and offensive zone play independent of each other. I originally thought that a defensive zone win would be significant, but as it turns out, it is neutral zone play that steals the show.
It may be most helpful to skip to the discussion section and then return to methods or results
The difficulty with trying to quantify neutral zone play is in the natural fluidity of the game. Currently, the NHL doesn't track the location of the puck, so we can only estimate it's location by play-by-play (PBP) events. PBP does give us starting faceoff locations, allowing us to break data into zone, faceoff, and first shift components. I define "1st shift" as all events that occur prior to any player changing. I use "first shift" because it eliminates any noise "dump and change" might introduce into the data. The great thing about 1st shift data is that we have a starting location for the puck, and we can then track Corsi events to give us an idea of where the puck stayed for most of that shift.
To grab the data I parsed the play-by-play data for every game for the last 5 seasons into Corsi events by 1st shift after a faceoff, even strength only. (BTW this was my first attempt at coding in Python, for all the programmers out there, let me apologize in advance). I could have used TOI data to find the exact time that players changed, but instead relied on PBP skater data to determine when a change occurred, thus I wasn't able to capture corsi events per time, but instead used Corsi events per shift. This still allows me to control for quantity of shots, and the impact of faceoff wins. I then picked up data from a previous database I created to grab total Corsi, Fenwick Close, and Pts data. Subtracting 1st shift Corsi events from total even strength Corsi events leaves me with all non-1st shift even strength Corsi events for every game. I then randomly split every team's games into 2 samples, capturing roughly 40 independent games per sample, equal amounts of home and away games, for almost all teams (5,812 games in all). I summed all events to give me roughly 80 games worth of data (split randomly in half) for each team. Leaving me with about 5 seasons x 30 teams x 2 independent samples = 300 data points. The variables of interest included the 6 categorized 1st shift variables defined by faceoff location and whether the team won the faceoff or not, fen close, pts, and non-1st shift corsi. I then used these data sets to run auto correlations and regressions of the data against out of sample Fenwick Close, and Pts.
This last piece of study design is huge for anyone contemplating analysis. In hockey, if you run dependent variables with independent variables from the same sample (eg. every game from a season) you will understandably find that goals for and against are the most important predictors of winning. It makes obvious since because team pts are assigned by the goal diff at the end of a game. However, we're not concerned with intra-sample wins. We want to predict future wins. So we can't use the same games that give us the dependent variables, as the games for the independent variables (eg. we can't use goals scored for games 1-82 to predict wins for games 1-82). You can either use split even odd games, or randomly split the season into 2 halves, and then run those 2 samples against each other, thus predicting future (or past) wins with data not from those games. In addition, you really need a large sample of data, multiple seasons for teams, or >150 for players generally, to have significant results.
I'll first present some interesting generalities about 1st shift data, and then move on to zone specific results.
It turns out that even strength play accounts for about 80% of the game. Of that time, the first shift after a faceoff is roughly 40% of even strength time (or 32% of all TOI). If we break down 1st shift Corsi events, it roughly follows a similar pattern, accounting for approximately 36% of all even strength Corsi events.
In addition we can track where Corsi events are more likely to happen. It's intuitive that a team will generate more "Corsi events for" (meaning all goals, shots, misses, and blocked shots for a team) with an offensive zone face off win, but let's break down each 1st shift to get a better picture.
1st Shift "Corsi Events For" by Zone and Faceoff outcome
|O-zone win||N-zone win||d-zone win||d-zone loss||n-zone loss||o-zone loss||Non 1st Shift|
|Mean "Corsi For" events per game||4.62||2.92||1.76||1.34||2.17||2.09||26.71|
|% of total||0.31||0.20||0.12||0.09||0.15||0.14||0.64|
|shifts per game||9.34||9.93||9.31||9.34||9.93||9.31|
The graph above isn't shocking, but it does show some interesting things.
- Half of all "Corsi events for" come off of an offensive zone faceoff win or a neutral zone faceoff win. That's alot, considering those 2 puck starting locations account for only a third of all puck starting locations.
- It appears that teams generate more "Corsi events for" following a neutral zone win, than when their opponent wins a draw in the their defensive zone (N-zone win = 2.92 vs. O-zone loss = 2.09).
- Outside of losing a defensive zone draw, all the other puck starting locations are roughly pretty similar, generating 2.17, 1.34, and 1.76 "Corsi events for" per 1st shift from a lost neutral zone draw, lost defensive zone draw and won defensive zone draw, respectively.
Zone specific performance, however, is a much different beast. Yes, from above we see that teams generate a lot of production from specific starting points, but that doesn't necessarily indicate that team performance from those locations is 1) repeatable or 2) predictive of future wins. As always tables first.
"1st shift" Corsi by Zone data
|O-Zone W Diff||N-Zone W Diff||D-zone W Diff||D-zone L Diff||N-Zone L Diff||O-zone L Diff||O-zone Diff||N-zone Diff||D-zone Diff||Non 1st Corsi Diff||Fen Close%||Pts/Game|
O-zone W Diff = 1st shift Offensive zone faceoff win Corsi per shift differential, (ie. [Corsi For - Corsi Against] after a faceoff win in the offensive zone before any player for either team has changed). O-zone Diff = O-zone W Diff + O-zone L Diff (ie. Corsi independent of winning or losing the faceoff). Non 1st Corsi Diff = All remaining Corsi For events after any player from either team has changed - All remaining Corsi Against events after any player from either team has changed. Fen Close% = Fenwick Close%. PTS/game = Team standings points per game. SD = Standard deviation.
the table above shows the impact of stating location and faceoff outcome on Corsi. A few notes that interest me.
- Neutral zone performance stands out the most to me. It has the highest SD, and correlates strongest with both fenwick close% and team pts/game. In fact, the correlation between Neutral zone performance and winning is greater than for Non-1st Corsi, of which there are far more events per game. This suggests that Neutral zone performance is very indicative of success.
- As expected the mean Corsi for neutral zone is 0. Non-1st shift is also 0, but with a much lower SD, and higher auto-correlation, ie. r(self). This is likely the result of much more Non-1st Corsi events. These 2 metrics look very similar, but there doesn't seem to be any interaction between the two. Both seem to be important predictors of future wins.
- Offensive zone performance is repeatable. But as we'll see below, isn't the best predictor of future Fenwick close% or Pts/game.
These linear regressions allows us to isolate the impact of each zone performance on fenwick close% and pts/game. I'm digging the bullet points, let's keep going with that.
- Despite including 5 seasons and all 150 teams, the regression shows that I can't detect a difference in zone performance to a statistically significant degree. Perhaps there may be another way to use NHL Play-By-Play data to get more information (eg. including "Hits"), but it looks like to me that there is a real need for tracking neutral zone performance. We won't be able to use what the NHL currently tracks to definitively elucidate zone specific performance.
- The linear regressions above do suggest that neutral zone performance is important. If we hold all variables constant (ie. isolate only neutral zone performance), a team that is 2SD above the mean in neutral zone performance (a really good team) boosts their fenwick close% by 2% and Pts/game by about 0.49, or roughly 7.15 points. (likely between 4 and 10 points above average).
- The linear regression for points/game shows O-zone diff to be non-statistically significant. I really have no good explanation for this. A linear regression using teams above 0.5 Fenwick Close% (ie.teams that are above average, not shown here) indicates the only statistical;y significant variable is neutral zone Corsi diff, and non-1st shift Corsi diff.
- Non-1st shift Corsi events show their true color in these regressions as highly important. It would make logically sense that performance after an on the fly line change, which involves a lot of neutral zone play, would be important given our earlier findings.
A few recent studies have suggested the importance of neutral zone play. While this study hasn't unequivocally shown the neutral zone to be more predictive of future success, I think it's safe to suggest, together with the other articles linked, that Corsi differential (Corsi For - Corsi Against) after a neutral zone draw is a good indicator of future success, more so than performance in either the offensive zone or defensive zone. Unfortunately it also seems apparent that NHL PBP data alone can't be used to sufficiently delineate the importance of zone specific performance. Clearly, much in the same way the scoring chance project ultimately showed the utility of Corsi/fenwick as a surrogate for scoring chances. The zone entries project might be able to prove the importance of tracking 1st shift data. At this point, the data is suggestive, but not definitive. (Eric T. was kind enough to send me 3 games of zone entry tracked data. The correlation between N-zone Corsi Diff and Zone Entries (r = 0.85) is encouraging, but 3 games is a very small sample)
The conclusions drawn from this study may have some people questioning the deployment of players in zone specific roles. But remember from above that a substantial amount (>50%) of offensive production comes after an offensive or neutral zone win, obviously having a huge impact on game outcomes. Simply put, neutral zone performance predicts future wins, offensive and defensive zone performance determines wins. I don't think this data really changes anything for zone starts. Coaches who deploy their players in zones best suited to their talent are efficiently utilizing the resources on their roster. If given an option for a player who performs equally well in the offensive and neutral zone, then perhaps, choosing the neutral zone above the offensive zone may be a better choice.
Some practical take home points.
- Full discovery of the importance of neutral zone performance will require tracking something the NHL currently does not. Join the Zone Entries Project!
- Corsi after a neutral zone faceoff, without changing any skaters, is predictive of future success. Likely more so than Corsi after offensive or defensive zone faceoffs.
- Corsi after a change has occurred (Non-1st Corsi Diff) is likely to be similar to Corsi after a neutral zone faceoff, It is more reliable, but less predictive of future wins.
- Over 50% of "Corsi For" events happen after either an offensive zone faceoff win or neutral zone faceoff win.
And lastly, the obligatory "rank" table, click the column headers to sort.
2011-2012 1st Shift Corsi Diff by Zone
|Team||O-zone Diff||N-zone Diff||D-zone Diff||Non 1st Corsi Diff||Fen Close%||Pts/Game*|
Columns are as above, table is sortable
*Some values may differ slightly from official results due to incomplete PBP data
For Sharks Fans
Although I don't have player data on hand to comment on, looking at the numbers above do show some trends consistent with our evaluation of the 2011-2012 Sharks. The Sharks were above average in every category (usually hovering around 10th in the league) except defensive zone Corsi diff. In fact, they ranked 25th in that category (-0.83 SDs). Clearly this represents not only a personnel issue (a combo of poor performance, injury, and perhaps poor luck) , but also likely some amount of system failure. Outside of this category the Sharks did well, balancing skill in many facets and vaulting their performance to 7th best Fen Close% in the league. An improved defense (as projected) as well as avoiding injuries should boost defensive zone Corsi diff for this up-coming year (maybe?).