AP Hockey Story of the Day: October 31

Yesterday I discussed a great piece on Bill James written by Joe Posnanski, somebody I read consistently and try to emulate much of my more conceptual writing after. But Joe blogged today about the most discussed play of Wednesday’s Game 7 of the World Series and made a point that I vehemently disagree with. Here’s the story.

Here’s the relevant passage:

“But my point is this: You don’t get a second choice in real life. You choose once and that’s it. And the reveal — you chose poorly — becomes the reality. And so when you look back at something that didn’t work, you now know that anything else, even the stupidest possible choice, MIGHT have worked. The only thing we know for an absolute fact is that the choice made failed.

In this case, we know how the World Series ended. It ended with Giants pitcher Madison Bumgarner entirely overmatching Royals catcher Salvador Perez, who hit a foul pop-up to end the game. That’s what happened, and it is unchangeable and, so, in the end, unjustifiable. If given the option to go back in time, the one thing you KNOW WILL NOT WORK is to let Perez hit.

It always entertains me when some coach or manager makes a move that doesn’t work and then grumps, “if I was given that exact same situation again, I’d do it again.” No you wouldn’t. It didn’t work. You’re telling me if time was reversed, and another chance was given, that Grady Little wouldn’t pull Pedro? Don Denkinger wouldn’t call Jorge Orta out? The Portland Trailblazers wouldn’t take Michael Jordan or Kevin Durant?” (bolding my own)

The bolded passage is the one I have a problem with. Sure, you know the decision to hold Gordon didn’t work out. But sport, like life, is probabilistic. There is a set percent chance that the next batter, Salvador Perez, would have found a way to drive Gordon in from third. We don’t know exactly what that percentage is – we never could – but we can make an estimate of it based on Perez’ batting numbers, Bumgarner’s pitching numbers, park effects, defense, etc. The specific percentage isn’t important, for this argument, but the point is it exists. Going into the decision of whether to send Gordon home or not, the third base coach has to have a vague idea of a) the percent chance of Gordon making it home safely, and b) the percent chance of Perez driving in the runner on third. In theory, if “a” is greater than “b”, you send the runner. Of course, the third base coach, Mike Jirschele, couldn’t possibly have calculated all that in a split second, but that’s the advantage of being an experienced third base coach (something that analytics folk often don’t recognize). When you are put in situations like that over and over, your instincts get better and better, and more often than not just on feel you’ll make the right call. Now it’s also possible that Jirschele didn’t take Perez into account at all, that he simply said, “is there a greater than 50% chance that Gordon makes it home right here?” and decided that the answer was no. That would mean leaving critical information out of the equation, which isn’t ideal, but would still rely on that baseball sense and experience.

But back to probabilities. Posnanski’s assertion is that you could replay that situation hundreds of times, and each time Perez would make an out and the game would be over, making the decision to hold Gordon wrong in a vacuum. But that’s just not how sports work. Say that Perez’s chance to get a hit off Bumgarner there was .200. It’s not unrealistic, after all, Bumgarner was dealing. That means that 1/5 of the time, holding the runner, the Royals would have tied the game.

Now let’s get back to Posnanski’s statement: “If given the option to go back in time, the one thing you KNOW WILL NOT WORK is to let Perez hit.” But you don’t know that. In fact, you approximate the chance at 20%. Even if somebody gave you a look into a future and showed you a vision of Perez popping up, that would only be one potential reality. In one out of every four others, Perez would drive the runner home. So it’s flawed thinking.

If a football coach goes for it on 4th and 1 needing a touchdown and doesn’t get it that doesn’t make it the wrong call. If a hockey coach pulls his goalie on the power play down a goal with a minute left and the other team scores, that doesn’t make it the wrong call. You can’t judge the result because results are variable; process is not. If the aforementioned probability “a” had been higher than “b”, then over a long series of similar situations, Jirschele would have come out on top. We know in the playoffs there aren’t really iterated games, but that doesn’t make probabilistic modeling any less valid.

We don’t really know whether or not the Royals made the right call – although there is certainly evidence from many that it was the right one – but we do know that the result of the game isn’t the judge of that. The result only muddies the process.

AP Hockey Story of the Day: October 30

It’s been a busy few days for great writing, and with midterms I didn’t get a chance to post anything yesterday, so this post is going to highlight multiple great reads with some thoughts.

1. Dave Feschuk of the Toronto Star penned this interesting piece on the Toronto Raptors’ use of bioanalytics to prevent injuries. Bioanalytics isn’t something I’ve delved too far into since, simply put, it’s not my field of passion or expertise, but there is undoubtedly an opening for this type of work in hockey. The tough part when it comes to any kind of physical testing (this relates back to speed performance tracking as well) is differentiating the meaningful material from the noise. For example, the fact that x player averages y speed in a game one day could mean that he needs to get more sleep or that the training routine must be adjusted, but it could also just be a symptom of the fact that in certain situations it doesn’t make sense to go full-speed. It’s something teams will need to handle with care, but with the amount of money at stake, and the ruinous effect injuries can have, it’s something worth an investment.

2. Jacob Rosen of Sports Analytics blog wrote their (always solid) weekly analytics roundup and discussed why it’s so important for good public work to continue to prosper and be circulated. If you’re not following Sports Analytics Blog on Twitter, make sure to do so. You’ll miss far less great content from now on.

3. Friend of the blog Rob Vollman published a nice look at coaching changes over at ESPN Insider. The research seems overwhelming that in-season coaching changes in general don’t massively impact possession numbers, and generally it is the less sustainable shooting and save percentages that tend to regress, making new coaches look all the better. My question with regards to this would be: We know that something like a coaching change can motivate players to try just that little bit harder, to pay just a little bit more attention to detail maybe, at least for a few weeks. Is it possible that just because a high shooting or save percentage isn’t sustainable, that doesn’t mean that it can’t be influenced by such a kick in the pants? Whether one can find non-variance-based causal explanations for unsustainable performance is one of the great mysteries, at least to me, about sports. And it’s something that hopefully we can find a way to determine in the future.

4. On that note, last – but certainly not least – Joe Posnanski (possibly the greatest sportswriter on the planet) wrote an awesome feature on Bill James, somebody who I admire greatly, not just for what he did for analytics, but for his ability to recognize areas in which he failed early on, and how he has had to change his attitude upon further reflection.

“Absence of evidence is not evidence of absence.”

Obviously Bill James didn’t coin the phrase but I will always associate it with him. How hard must it be for essentially the grandfather of modern baseball statistical usage to turn around and say that clutch hitting exists as a trait, despite what the numbers say on the subject. Going back to the Vollman piece, there is no evidence that shooting and save percentages can be heavily influenced by individuals in the short term, since they tends to not be able to persist. But I’m not sure I’m willing to close the book on that just yet.

AP Hockey Story of the Day: October 28

Sean Fitz-Gerald wrote a good piece in the National Post today on the buzz-word “compete level” that has followed a number of teams around, but in particular the Leafs, through their episode of “we are who they thought we were” review the last couple of years.

I haven’t watched more than a couple of Leafs games this year so I can’t comment on their performance, particularly, but I do think that most coaches defer to hard work as a cause of losing in times when talent level, variance, or poor coaching are possibly more apt explanations. It’s possible, however, for a team to simply not work hard enough. I’ve certainly been on teams like that, and sometimes it reflects badly on the coach; sometimes it just reflects badly on the players themselves.

So how can one tell if the team is truly getting poor results because of work ethic? How can you know if you need to clean house on that front? Well a good place to start might be this old post from the awesome Phil Birnbaum discussing Bob McCown and his book, “The 100 Greatest Hockey Arguments”. McCown suggests an experiment in which objective viewers watch games in their aftermath with plays that involve goals taken out. Could those people figure out who won the games in such a setting? Doubtful. They might, however, gain a more unbiased idea of which team had a higher “compete level” without the benefit of goal-based hindsight. Not sure how this is really applicable to team environment, or how it could be acted upon, but it is at least representative of the bias that assessing performance under the captions of “W” or “L” can promote.

Just a thought.

AP Hockey Story of the Day: October 23

Craig Custance has a great (unfortunately insider only) piece on coaching salaries in the NHL, and Mike Babcock’s stand for his fellow coach. It’s a very interesting read.

The biggest reason why hockey is so in need of a catch-all stat isn’t for basic analysis, because combining different metrics isn’t so hard, but it’s so that there’s a baseline for salaries. Being able to say that Bryan Bickell adds 2.4 wins per season above replacement but Joel Quenneville adds 3 would be huge, and would affect salaries in a big way, equivalent to what has happened at the GM level in baseball.

AP Hockey Story of the Day: October 22

Considering that a large part of what I use twitter for is finding great things to read, and then to tweet them out to my followers, I’ve decided to streamline this process a little bit. Considering that some of my analytics work (but don’t worry, not all) will be transferring to Hockey Prospectus, I’ve decided to pick a piece of writing, every day (okay probably not every day) that I believe to be truly great. This could be a work of analytics, something conceptual, it doesn’t even necessarily have to relate to sports, although it most often will. Instead of posting said piece on twitter, I will post in hear under the headline “AP Hockey Story of the Day” and, time-willing, will offer some brief thoughts on the piece.

So long story short, if you like reading the types of articles I tend to tweet out, make sure you either visit the blog frequently, or at the very least keep tuned to my twitter account as I tweet out links to these pages each day.

Without further ado, my story of the day for October 22nd is this piece by Jack Moore of vice, “How Wall Street Strangled the Life out of Sabermetrics“.

Some thoughts:

1. Moore writes something of a bittersweet tale of the birth and, to an extent, death of the public analytics movement in baseball. Morganization, the process that the business world underwent in the early 20th century revolutionizing efficiency, is a great parallel to what took place in baseball ten years ago, and what seems to be occurring in hockey today.

2. The big question is what comes next. How far do hockey teams go in their attempts to keep advanced knowledge out of the public sphere, and has the so-called “Summer of Analytics” in fact set the sport back in terms of public analytical progress.

3. The next big frontier in hockey is determining a valid catch-all statistic. Not because a catch-all in itself is important, but because it’s critical in evaluating performance monetarily. For example, how much does a win cost? How much does a 30-goal scorer cost? How much is a top-end GM worth? With that type of analysis, the market should correct itself even more than it already has. Sidney Crosby will make more money. Stan Bowman will make more money. Eventually, Stan Bowman’s analytics team (if it can continue to distinguish itself from the pack) will make more money.

4. I loved the anecdote about British mathematician G.H. Hardy. The guy was thrilled to be working in meaningless mathematics because his work couldn’t, say, help scientists to build weapons of mass destruction. Sure enough, after his death, Hardy’s work became the foundation for modern cryptography. It had use after all.

The Deployment Dilemma: How Fourth Lines Can Maximize Team Output

The season is finally here, which is great for most of us. But for some young up-and-coming players, it means getting cut from NHL rosters (whether now or following brief tryouts) while players who nobody would argue possess equal skill occupy the league’s fourth line spots. It’s an interesting dilemma, and one that until now nobody has to my knowledge managed to quantify. I’m only going to take the first step here, and speak in very broad generalities, but I hope that this piece will frame the debate over the use of the fourth line a little better, and present some evidence that maybe it’s time for change.

But first, a little perspective. Hockey wasn’t always about toughness or grit. In Canada, hockey’s roots lie in lacrosse, where early Canadians sought to find a winter alternative to their favored summer pastime. They implemented elements of rugby (like the no-forward pass), which certainly brought an element of ruggedness to the idea, but keeping the puck – or in the earliest days, ball – was basically as important as it’s seen to be by analytic types today. In Russia, meanwhile, hockey developed from soccer. Possession there was also critical, and simply speaking the best players played. There wasn’t much need for pests, or rats, or whatever you want to call them. It would have been out of character considering the sport’s origins.

But then the wars hit. First World War I, then World War II. Many of the top hockey players in North America, even those who had been competing in the National Hockey League, went oversees, and the league had to cope with less talent, and declining attendance. It was then that the forward pass was implemented, the blue lines were created, and in order to add some concrete excitement to a dying league, fighting was specifically outlined and legalized in the rulebook. Of course teams, fearing for their survival and suffering from a lack of talent, found guys to come in, play minimal minutes, and fight plugs on other teams. Fighting was a release for both players and fans from war tensions – to be clear, fighting had been around before, it just hadn’t been formally allowed.

Due to the lack of talent at the time, the dump and chase also became prominent. Many of the “replacement players” didn’t have the talent necessary to get by defensemen with the puck at the blueline, so coaches relied on players’ natural speed and toughness to give up the puck only to retrieve it. The practice became a staple of North American hockey, and while it was somewhat slowed by the end of the war and the return of many original players, the expansion of the 1960s allowed it to develop and thrive once again. Hockey had become watered-down, and dump-and-chase was the safest strategy for those players the coaches didn’t trust to get into the zone safely with the puck still on their stick.

Why is this important? Because there has never been an inherent necessity for toughness, at least not in the way players like Shawn Thornton, George Parros, and other enforcers embody it. Toughness is important for a player so that he doesn’t get phased by failure, so that he can stand up to bullies, so that he can fight through checks, but not in terms of the ability to fight, or to sit menacingly on the bench. The Chicago Blackhawks of the past few years have shown that, if nothing else.

So why is the fourth line necessary, in its current form, according to most hockey people? Thankfully, the dialect over the past few years from coaches has shifted from speaking about “top 6″ to “top 9 forwards”, at least recognizing the importance of talent over physicality to an extent. But why should it stop there? Why shouldn’t we be talking simply about NHL and non-NHL forwards? What are the reasons why the disconnect between “top-9″ and “bottom-3″ forwards still exists? Is this yet another example of the NHL not adapting nearly quick enough to the evolutions in the game of hockey? Or are there still reasons to keep things as is.

I thought about it, and came up with two good reasons why I felt a conventional fourth line might be necessary:

1) Penalty Killing.

Advanced hockey analysis tends to focus on even-strength over special teams for the same reason as it focusses on shot quantity over quality: not because the latter doesn’t exist, but simply because the former is far more easily quantifiable, and thus firmer conclusions can be drawn.

Special teams are very difficult to analyze at this point because the sample sizes are small, and because units play so much time together (against very similar opposition competition) that, unlike at even-strength, it is nearly impossible to determine which individuals are responsible for success or failure.

Many fourth liners are used on the penalty kill, but whether they are truly necessary for that reason is an open question. I didn’t do any numerical analysis on this, but I would guess that – especially as the appreciation for good two-way centers has increased – the need for those specialists is on the decline. Take for example, Max Pacioretty, a player who nobody would think of as a conventional penalty killer, but who last year, thanks to his high-energy, long skating stride, active stick, constant pressure, and improved positioning, was among the best penalty-killing wingers in the game.

I would argue that even at this point, having PK specialists can help a team. Organizations can, however, fairly easily get by without them, as long as the rest of their personnel is arranged in a way to pick up the slack. This is something I might focus on in a future post.

2) Limited ice time.

The other reason that made sense to me why one wouldn’t want bottom 3 forwards treated the same way as others is because they don’t get much ice time. According to this great piece by Garret Hohl (which I will reference more later), fourth liners average 8:03 of even-strength ice time per game. If a team has a bright young player, it’s understandable why they would want them playing top minutes in the AHL, say, rather than under 10 minutes in the bigs. Development, etc. etc. The limited ice time for fourth liners, obviously, makes sense on the surface because it frees up top liners (the big guns) to play more ice time, and thus, maximize their value.

BUT.

But, here’s where opportunity cost comes into the picture. Without numerical analysis, how can you know that the conventional method actually maximizes the use of those top players, and of the team as a whole? Essentially the question is: What is the opportunity cost of demoting talented youngsters in favor of less-skilled veterans, in order to play top players top minutes, in terms of goal differential per game?

Here is my very broad and basic attempt to answer that. Obviously I recognize that the situation changes by team and by year, so further exploration would be necessary before any specific team came to a conclusion on this.

Screen Shot 2014-10-07 at 9.16.15 PM

Here are the relevant columns of the first chart from that piece by Garret, which groups NHL players into buckets based on ice time. The total TOI/G at even-strength (these are all ES numbers) is 46:59, so I’m going to work under the assumption that that is approximately the amount of ES play per game. Any model for ice time needs to work within that parameter.

After examining this chart, I took a look at average ES ice time for rookie forwards in the NHL who played in at least 50 games last year. That number was 12:06. If you took out the three who played less than 10 minutes per game, that number would be 12:44, but the important fact I took away from that examination was that a number of teams were content with rookies averaging anywhere above about 11 minutes of ES ice time.

Screen Shot 2014-10-07 at 9.20.48 PM

If somebody as talented and young as Tyler Toffoli can play 11 minutes at ES per night and a smart organization like the LA Kings can be okay with it, then the same can be said for just about anyone, I think. Presuming they’re NHL ready.

So the next thing I did was to assemble a list of young forwards who were either cut this training camp in favor of less talented NHL veterans because there was no space in the “top 9″ for them, or had been in danger of it. I crowdsourced this to Twitter, and what follows is the list I decided upon.

Screen Shot 2014-10-07 at 9.25.54 PM

The list of players isn’t actually very important, as long as you’re willing to accept that some young player who may miss out because they don’t fit a fourth line role could potentially have a CF% of 50.25%, as well as the rest of the production you see above.

Disclaimer: The 54.12% offensive zone starts isn’t relevant to this illustration, but I will note that that number does confound the results somewhat, so it’s something for future examination. The conventional fourth liners likely – although not necessarily – have far lower OZS%s, and therefore their CF%s are bound to be lower. I don’t think that renders this study useless, but it’s something to be taken into account.

So with this list of eight rookies, I essentially made a 5th bucket, of AHL/NHL tweeners, one might say, for comparison. I took the conventional bucket format, and weighted shot and goal differentials, in unison with save and shooting percentages, to find an expected GF%. The numbers were slightly off – I’d expect from rounding with the data collection – but here’s what I found:

Screen Shot 2014-10-07 at 9.33.46 PM

That bolded number is the expected goal differential per 60 minutes of even-strength ice time. What I did next was see whether there was a TOI arrangement that allowed for the following things:

1) The fourth line, now composed of young skilled players, to play at least 11 minutes

2) None of the other lines to play as little as the fourth line

3) To keep every line happy

4) To improve expected goal differential

5) Obviously, not to surpass ~47 minutes of even-strength ice time per game.

Here is what I found:

Screen Shot 2014-10-07 at 9.33.30 PM

If one plays the first line 12.5 minutes (rather than 14.42), the second line plays 12 (rather than 12.83), the third line stays the same, and the fourth line plays 11 minutes, then the expected goal differential can be improved upon. The actual amounts don’t matter all that much, since the numbers are vague overall, but the point is that it could be done.

What about the fact that your top players are now only playing 12.5 minutes at even strength? Good, now they’re a) Less tired at the end of games, b) Less tired at the end of the season, c) Less injury prone. Or, taking it in another direction, they can now play more on the power play where their skill set is maximized, and even in some cases take more responsibility on the penalty kill, since the fourth line specialist is now extinct. Max Pacioretty, for example – somebody who has been injured somewhat frequently during his career, and who plays both power play and penalty kill – would certainly thrive under such an arrangement.

Conclusion.

So if you didn’t read all that, what is there to take from this piece? Essentially, based on quite rudimentary research and analysis, it appears as though the opportunity cost of cutting youngsters in favor of more established, low ice-time grinders, is very large, and ultimately isn’t worth it. Teams, by and large, would be better off playing their top-12 players overall, and more evenly distributing even-strength ice time.

 

2014-15 NHL “Why Not” Ranking Predictions

Predictions are pretty dumb since, as I’ve written about 100 times, hockey is a game of probabilities. That said, since I now have a space to write about whatever I want, and since I’ve been doing an awful job lately filling it with words, here is my view on what is most likely to go down here. Feel free to argue in the comments or on Twitter.

Atlantic:

1. Tampa Bay

2. Boston

3. Montreal

Metropolitan:

1. Pittsburgh

2. New Jersey

3. NY Islanders

Wild Card:

1. NY Rangers

2. Columbus

—————

3. Detroit

4. Washington

5. Toronto

6. Florida

7. Philadelphia

8. Ottawa

9. Carolina

10. Buffalo

 

Central

1. Chicago

2. St. Louis

3. Dallas

Pacific

1. Los Angeles

2. Anaheim

3. San Jose

Wild Card

1. Minnesota

2. Nashville

————–

3. Colorado

4. Vancouver

5. Edmonton

6. Winnipeg

7. Phoenix

8. Calgary

 

Stanley Cup Pick: Chicago over Pittsburgh. The one we’ve wanted for a while, entertainment-wise.

On Zach Parise, Players Using Analytics, and Healthy Skepticism

There’s a show called Modern Family in which a couple adopts a child named Lilly. The show plays up the little girl’s typical curiosity, as at varying points she’ll go into a phase where she just asks “why” continually until her parents get fed up. For some reason I thought of that while reading this piece by the Star Tribune’s awesome Minnesota Wild columnist Mike Russo this morning. Parise has always been one the players I have most admired because I think he blends the old school abilities to work hard, go to the net, block shots, and play in all situations, with the more new school skills at puck handling, entering the offensive zone with control, and most importantly, scoring goals. I may be Canadian, but even I could appreciate how cool his game tying goal in the 2010 Olympic Gold Medal Game was (and let’s just not forget how that game ended).

Anyway, simply by asking “why,” Parise was able to figure out that his attitude towards the dump and chase needed to change. I wrote yesterday about why I think it’s important that players understand and embrace analytic concepts, rather than simply heading future analytic-minded coaches’ advice. And I think Parise is a good example of why. In his first season in Minnesota, he made this comment.

“We went to the Finals dumping and chasing. We did it more than anybody. And we scored a lot.” And for players who don’t understand the probabilistic nature of hockey (hint: most surely don’t), that attitude is pervasive. You can win while dump-and-chasing, just like you can win without taking a lot of shots. But that doesn’t mean it will continue or that there aren’t problems.

“I just got kind of, not brainwashed, but my last couple years in New Jersey we were so adamant about dumping the puck in,” Parise says in Russo’s article. And so does every player. They grow up learning to play a certain way, and unless they’re lucky enough to pass through Kyle Dubas and Sheldon Keefe’s Greyhounds program, they likely never get what they’ve learned challenged. And the older you get, the harder it is to unlearn.

So let’s be Lilly for a moment, and consider the type of scrutiny that could lead a player to challenge his or her own views:

I’ve been told to dump the puck nearly exclusively in the neutral zone.

Why?

Because that’s the way it’s always been done.

But why?

Because you need to get the puck deep.

Why?

Because by getting the puck deep you can skate hard and go regain possession in the offensive zone.

But didn’t you already have possession? By carrying the puck in wouldn’t you be giving the puck up just to go get it again?

Well sure, but if you try to carry the puck in you might give up possession and cause a turnover.

But isn’t dumping it in essentially a turnover anyway?

Well kind of, but it would be a turnover in a less dangerous place.

But if you gain more possession, shots, and chances from carrying the puck in, isn’t it possible that you earn more net goals by carrying the puck in as much as possible, even if sometimes it results in dangerous turnovers?

And that’s where reading something like this, or any of the shorter summaries around the web, can come in handy for players. You don’t need a background in math, or even heavy reading skills, to understand the concepts or the conclusions. They’re hockey concepts after all.

Players don’t need to spend their time figuring out hockey’s yet to be unearthed inefficiencies, but they do need to have an open mind and a healthy skepticism. As do we all. Always ask “why,” and if you can’t find a satisfactory answer, change your approach until you find one.

On whether analytics are useful to players

There have been some interesting (let’s be honest, ridiculously annoying) quotes recently on the acceptance of analytics amongst coaches and players. “Of course coaches should use it,” the popular refrain goes, “but I don’t think players themselves can take much use from it directly. Well this interview between FanGraphs and Oakland A’s outfielder Brandon Moss suggests otherwise. Give it a skim and take note of how Moss’ awareness of in what situations and under what conditions he has success influences choices he makes with regards to his technique and strategy. Sure, it’s easy to say that coaches should look at the numbers and then relay general concepts to the players, but with that approach you’re opening yourself up to the same problems in acceptance as you are bringing general concepts to GMs without numbers to back them up.

Of course, not every player will have any interest in looking at numbers, and they don’t all have to. There will soon enough be companies – like there are in basketball – hiring themselves out to players, rather than teams, for individual analysis. Which side am I better at driving the zone towards? Do I shoot better glove high or stick low? Do I allow too big of a gap against charging forwards?

This is all info that players who learn about analytics can derive instantaneously, and cutting out the middle man – while in some cases irritating coaching staffs – can lead to big payoffs.

So yes, analytics are useful to players. And soon enough, many will be using them directly.

On whether descriptive, non-predictive stats really have use

As many of you know, I love reading about baseball and soccer analytics. I think that in baseball’s case, having the “first mover” element means that later-adapting sports like hockey can look to copy many of their concepts and ideas, even if the sports are quite different. Soccer, meanwhile, is quite a similar game to hockey – just a slower version – and therefore many of the more specific practices translate quite well.

Soccer is at an interesting point though because they, unlike baseball, are uncovering new statistics (courtesy of companies like Opta) while also just now figuring out which of those statistics are meaningful. You can get an idea of how far behind analytics in soccer is by the fact that its main predictive statistic: Total Shots Ratio – which is essentially the soccer version of corsi – is actually based on corsi.

Anyway, I was just reading this article by Mike L. Goodman (seriously, what would we do without Grantland these days?) and I was drawn to one particular paragraph, which discussed a concept that I think hasn’t completely been fleshed out – and may not be for some time – when it comes to descriptive vs. predictive statistics, and their merits. Here’s the passage:

“Arguing against the box score (and counting numbers) argues against using statistics for descriptive purposes. While traditional baseball stats are not particularly predictive of what will happen, they are very descriptive of what has happened. Try describing how the Baltimore Orioles played over the last week without using statistics, or try explaining how good Clayton Kershaw is. Stats that are increasingly becoming discredited in baseball don’t fail to describe how good or bad performance has been; they fail to explain the whys of that performance and, consequently, whether or not they would continue.” (Bolding my own)

Goodman claims that statistics that are descriptive but not predictive still have value in answering questions like “how well has team X played over the past week” or “how good is X player”. But is that really the case?

Let’s take an example that hits close to home for hockey fans. PDO. It doesn’t stand for anything but I like to think of it as “Percentage-Driven Output”. It is shooting percentage + save percentage, and if that number is significantly higher than 100, you’ve probably been lucky, and vice versa. Why? Because there has been no proof that in most cases maintaining that is sustainable. Goodman, based on this article, would say that a number like goal differential, which incorporates both fairly sustainable (shot differentials) and not very sustainable (shooting success) elements, is useful in determining which teams are good or have been good, even if they don’t predict which teams will be good in the near future.

But does that really make sense? On the one hand, we want to recognize past accomplishments, whether or not they are repeatable. Think Justin Williams and his game 7 prowess. But on the other hand, if it’s understood that success (or failure) isn’t sustainable, then is it really fair to use that success in descriptions as “good” or “bad”?

Let’s look at this another way. Goodman says that conventional stats like RBI or goal-differential are “descriptive of what has happened.” On the surface, this is true. But if they don’t give  an accurate description of why things have happened, then what what good does it do? In the end, it just adds to one of the bigger roadblocks to analytics acceptance that there is: belief that we understand the past, when we don’t.

Consider these quotes from two renowned authors and thinkers in the fields of psychology and statistics.

“The core of the illusion is that we believe we understand the past, which implies that the future also should be knowable, but in fact we understand the past less than we believe we do.” – Daniel Kahneman

“The illusion that we understand the past fosters overconfidence in our ability to predict the future.” – Nassim Taleb

It may seem harmless to say, “Albert Pujols had 4 RBI last week, so he was clearly at his best,” it is in fact misrepresenting not only the future, not only the player’s true talent, but also the past.

I wrote at the beginning of this post that I thought this was a concept that both hadn’t been completely fleshed out, and that I thought might not be for some time. That is because although I think this argument is persuasive, I’m not sure it’s the only one. As I wrote in the link I posted earlier on, I think unsustainable streaks of magic have their worth, and deserve to be celebrated. It’s unclear to me, though – and I believe still is deep down in the minds of even the most adamant analysts – how these two views can be reconciled.