I'm not a big stathead, but I do like pretty pictures and someone explaining stuff in easy to understand terms. I spend most of my time on Athletics Nation, where there are plenty of people who do great stat work. Recently there was some talk about R, the free statistical and graphing software. I had played around with Matlab back in college and some R a few years ago, but I thought I'd play around with it again. To start with something simple, I decided to take a look at the NHL point for/against distribution as a scatterplot, and play around with the data and see how the Sharks stack up.
To get started, I grabbed the numbers from NHL.com's standings chart. All the data current as of the start of the All-Star Break. Every team has played just about 50 games.
The team dots are color coded by division, and the division leader has a black outline on their dot. The red line is the linear regression best-fit line. The blue line is the "logically weighted polynomial regression." Whatever that means, the line is a best-fit using data nearby that point of the line, instead of all of it.
As we go along the X-axis, we go from teams that score less to the teams that score more. I reversed the Y-axis from what you would normally see so that as we go up the Y-axis, we go from teams that allow more goals to teams that allow less. It seems more natural to me that way, making the most desirable location the upper-right quadrant. Not surprising, the division leaders tend toward that area.
I don't want to analyze this specific graph too much, because I made a second one that uses Goals For and Goals Against averages. This will nullify the difference in games played between all 30 teams. Most teams stay in their same general areas, but there is some jostling about.
You can see that some teams, like the Boston Bruins and Pittsburgh Penguins lean on their defense more than their offense. They score good goals, but are the best two teams when it comes to not allowing goals. The Philadelphia Flyers and Detroit Red Wings are stronger in the scoring department, but they allow more than do Boston and Pittsburgh. The Vancouver Canucks lies between these two groups.
Looking at the red best fit line, I believe it can tell us how well the team's balance of offense and defense compare to the league's overall balance. The slope of the line is less than 1 (technically it's between -1 and 0, but that's from my axis flip), so the league leans toward the offensive side of things. Philadelphia and Dallas Stars are among the handful of teams close to matching the overall balance of the league; the difference being that Philadelphia is better than Dallas.
The blue line takes information from the nearby team data, as opposed to the entire NHL to determine a best fit. The obvious analysis from this line is that the sucky teams suck. They allow lots of goals and they don't score much. What is interesting about this line is that when you get to the middle section, the number of goals allowed by the teams doesn't change much from the teams that score a lot. The only difference is in the offenses. What does this mean? I don't know, you tell me.
Let's take a look at some of the individual divisions, starting with the Atlantic and league leader Philadelphia. Their only competition is Pittsburgh, who is great defensively, and really good on offense. From there you have the New York Rangers, who are also really good defensively but in the middle of the road for offense. Both of those two teams will have to shore up their offense to make a run at the division title. Looking at the other two teams, New Jersey and the New York Islanders, who help bring up the tail in regards to the league standings, you can't help but wonder if the other three teams perhaps had some of their numbers padded by playing them more.
Over in the Southeast, the Tampa Bay Lightning is leading with an even balance of scoring and defense by the actual numbers, but by the league balance, they aren't doing so well defensively. The second place Washington Capitals is across the red line, with a better defense and an average offense.
The Pacific, with our beloved San Jose Sharks, appears to be one of the closest divisions in terms of similarity of style, and if you look at the standings, they're the closest top-to-bottom in points. Any of these teams could find themselves battling Dallas for the lead with a modest scoring streak.
So let's have some more fun. I grabbed a point way past all the teams in the 'better' direction of the northeast quadrant along the red league best fit line, and calculated the distance of all the teams to that point, which is [1M, -32k-ish]. I really wanted to use some sort of line that ran perpendicular to the red line, but that would require a lot of stuff I'm not familiar with yet in R. Instead, I just used that really far point to minimize the lateral error, and adjusted it to a easier to read scale.
The reason I did this was to give us a one-dimensional way to compare the teams to a hypothetical 'target.' It's easier than eyeballing how close each of the teams are to the league best fit line. You can do well by bombarding the back of the opponent's net, or building a wall in front of your own, but working on both of these will make you a better team all-around. For example, Chicago and Pittsburgh are separated by a lot on the scatterplot, but are very close on the bar graph, so they are getting the job done about equally even with different team strengths.
How do the Sharks fit into all this? Well, like the title says, they're just about average. They're not that far from the NHL balance, and they're right on the line for the local balance. The good news is that if they get their stuff together and start performing like they have over the past few years, they should be able to leap-frog over the teams in the middle and get themselves set up for the post-season. Unfortunately, part of that is relying on those same teams to have a hiccup and the chances of all of them doing that is slim.
Here's the data summary from R, for anyone who is interested.
Min. :101.0 Min. :112.0
1st Qu.:130.0 1st Qu.:129.2
Median :141.5 Median :143.5
Mean :141.1 Mean :141.1
3rd Qu.:152.8 3rd Qu.:152.8
Max. :174.0 Max. :168.0
Min. :2.061 Min. :2.240
1st Qu.:2.653 1st Qu.:2.547
Median :2.810 Median :2.825
Mean :2.821 Mean :2.824
3rd Qu.:3.035 3rd Qu.:3.080
Max. :3.480 Max. :3.429
Where do we go from here? We could analyze the variance between each division separately, and place a point somewhere in their 'center.' That might tell us which division is stronger as a whole. The red best-fit line would also be interesting to see as it changes over the decades. That might tell us how important scoring has been and when the focus changed to this offensive-first style. We could compare division winners over the years to see how that compared to the league in general. If I could find a way to get the data easily, I wouldn't mind seeing a bunch of points plotted to show the GFA/GAA as it changes over the span of the year for a team or group of teams.
That's all I got. Let me know if you have any comments or questions.