The most complete database of advanced hockey stats
I get quite a few questions sent my way so I want to answer a few of the most frequent ones here.
I want to learn more about hockey analytics, is there somewhere that provides a good overview?
I wrote an Introduction to Hockey Analytics for MapleLeafsHotStove.com which, and I am biased here, I think provides the best, most complete, introduction to hockey analytics available. In it I tried to cover all the main statistics as well as outline the pros and cons of each statistic.
Where do you get your data?
I get my data from the play by play, shift/TOI tables, and event summaries that are posted on NHL.com for every game. You can find these pages by going to a box score for a game at NHL.com and clicking on the links under "Official Game Reports".
Are your stats updated automatically every night?
No. To update the stats I need to download the play by play, TOI tables and event summary pages to my own computer, run a program to update the stats, and then upload the updated stats to the web server. The process is mostly automated and doesn't take a lot of my time, but the program does take some time to run and updated data takes a fair bit of time to upload to the website (WOWY tables include a lot of data). All totaled it is probably a 45 minute process so I don't update them daily. My goal is to get them updated every Monday, Wednesday and Friday mornings but offer no guarantees.
What is Corsi?
Corsi is a shot attempt metric and includes all shots, shots that missed the net and shots that were blocked.
What is Fenwick?
Fenwick, like corsi, is a shot attempt metric and includes all shots and shots that missed the net but not shots that were blocked.
What is PDO?
PDO is nothing more than shooting percentage + save percentage.
What do your Hockey Analysis Ratings (HARO, HARD, HART) mean?
Here is a brief summary of the Hockey Analysis Ratings. Briefly, HARO is an offensive rating, HARD is a defensive rating and HART is an overall/total rating. I have calculated these ratings using goal, shot, fenwick and corsi data.
Do you have a quality of competition/teammate metric?
Yes, there are a few such metrics. The first thing you can look at are the TMGF20, TMGA20 and TMGF% metrics (along with their shot, fenwick and corsi counterparts) for a given player for a quality of teammate metric. These are indicative of the offensive, defensive and overall ability of the players teammates. Alternatively you can use HARO QOT, HARD QOT, and HART QOT which is the average HARO, HARD, and HART of the players line mates. The same stats exist for Quality of Competition under OppGF20, OppGA20, OppGF%, HARO QOC, HARD QOC, and HART QOC along with their shot, fenwick and corsi based metrics.
Are these competition/teammate metrics better than CorsiRel and CorsiRel QoC that I see referenced elsewhere?
In my opinion yes and for a few key differences. CorsiRel is a direct comparision of team performance when the player is on the ice vs when the player is off the ice. When I calculate my quality of team mate stats I compare each team mate stats when they are on the ice with the player and when they are not on the ice with the player. This is better because it directly looks at the quality of line mates the player plays with, not the quality of team the player plays on, when he may not actually play with some of those players (i.e. Pavel Datsyuk rarely played with Kris Draper, so why should we consider Kris Draper's stats when measuring Datsyuk's quality of team mates? CorsiRel QoC is just an average of the opponents CorsiRel so if CorsiRel isn't as good, neither is CorsiRel QoC.
Think of it this way. Corsi Rel tells you whether the player is one of the better players on his team and Corsi Rel QoC tells you whether the player plays against the opponents better players. The quality of teammate and competition metrics on stats.hockeyanalysis.com are more true quality of teammate and competition metrics and thus are more reliable metrics for comparing players on different teams.
Should I use goal, shot, fenwick or corsi ratings?
This is a bit controversial. There are many who believe that the majority of a players value can be determined from shot based metrics but I believe goal based metrics have value as well. The main issue with goal based metrics is that goals are relatively infrequent events as far as statistics are concerned and because of this fact it is difficult to get a large enough sample size to be able to weed out all the randomness that occurs with them. From my research the benefits of using a goal based analysis begins to over take the benefits of the greater sample size of shot attempts at about one full seasons worth of data. So, if you are looking at less than a full seasons worth of data (i.e. ratings for a not yet complete season, or for a player that missed a significant number of games due to injury, or fourth liners who don't get a significant amound of ice time) it is probably best to consider a corsi or fenwick based rating (doesn't really matter which since both are highly correlated). If you are looking at ratings using more than one season of data I'd definitely consider a goal based metric a better metric. At about the one full season mark the two metrics probably have equal value so it probably doesn't matter which one you use, but I'd tend to lean towards a goal based metric.
An alternative would be to look at a goal based metric for offense (i.e. HARO) and a corsi or fenwick based metric for defense (i.e. Fenwick HARD). The reason why one might want to use a shot based metric for defensive rating is to ensure that the goalie is fully factored out of the equation (I attempt to do this in goal based HARD rating, but I am not fully convinced that my method actually accomplishes this though in theory it should).
How do you adjust for zone starts?
I adjust for zone starts by ignoring the first 10 seconds after a face off in either the offensive or defensive zone. I do this because it has been shown by both myself and others that the benefit of a zone start is almost completely negated after 10 seconds of play. Currently there is unadjusted 5v5, 5v4 and 4v5 data available and zone start adjusted 5v5, 5v5 close, 5v5 tied, 5v5 leading and 5v5 trailing data. I hope to add unadjusted 5v5 close, 5v5 tied, 5v5 leading and 5v5 trailing at some point in the future in case anyone is interested in it but it isn't a high priority (some people have requested this for team data especially).
What does 5v5 close mean?
Close play is when the game is tied or within one goal in the first or second peroiods or tied int he third period.
The stats on here are different than what I see on other websites?
There are a few reasons that might cause this. First, because stats.hockeyanalysis.com is not automatically updated and generally only updated every couple days the stats may be slightly out of date. Another reason is that some other sites don't factor out goalie pulled situations. All of the 5v5 data on stats.hockeyanalysis.com data does not include play when when the goalie is pulled (essentially a 6 on 5) whether in a late game or delayed penalty situations.
Can we get playoff data?
Playoff data is on my todo list, but is not high priority because my goal is player evaluation at a macro level and I don't believe that playoff data is particularly useful for this purpose. That said, I understand why people want it and why it is useful/interesting to look at so I hope to get it added to the site at some point.