As many of you know, I love reading about baseball and soccer analytics. I think that in baseball’s case, having the “first mover” element means that later-adapting sports like hockey can look to copy many of their concepts and ideas, even if the sports are quite different. Soccer, meanwhile, is quite a similar game to hockey – just a slower version – and therefore many of the more specific practices translate quite well.
Soccer is at an interesting point though because they, unlike baseball, are uncovering new statistics (courtesy of companies like Opta) while also just now figuring out which of those statistics are meaningful. You can get an idea of how far behind analytics in soccer is by the fact that its main predictive statistic: Total Shots Ratio – which is essentially the soccer version of corsi – is actually based on corsi.
Anyway, I was just reading this article by Mike L. Goodman (seriously, what would we do without Grantland these days?) and I was drawn to one particular paragraph, which discussed a concept that I think hasn’t completely been fleshed out – and may not be for some time – when it comes to descriptive vs. predictive statistics, and their merits. Here’s the passage:
“Arguing against the box score (and counting numbers) argues against using statistics for descriptive purposes. While traditional baseball stats are not particularly predictive of what will happen, they are very descriptive of what has happened. Try describing how the Baltimore Orioles played over the last week without using statistics, or try explaining how good Clayton Kershaw is. Stats that are increasingly becoming discredited in baseball don’t fail to describe how good or bad performance has been; they fail to explain the whys of that performance and, consequently, whether or not they would continue.” (Bolding my own)
Goodman claims that statistics that are descriptive but not predictive still have value in answering questions like “how well has team X played over the past week” or “how good is X player”. But is that really the case?
Let’s take an example that hits close to home for hockey fans. PDO. It doesn’t stand for anything but I like to think of it as “Percentage-Driven Output”. It is shooting percentage + save percentage, and if that number is significantly higher than 100, you’ve probably been lucky, and vice versa. Why? Because there has been no proof that in most cases maintaining that is sustainable. Goodman, based on this article, would say that a number like goal differential, which incorporates both fairly sustainable (shot differentials) and not very sustainable (shooting success) elements, is useful in determining which teams are good or have been good, even if they don’t predict which teams will be good in the near future.
But does that really make sense? On the one hand, we want to recognize past accomplishments, whether or not they are repeatable. Think Justin Williams and his game 7 prowess. But on the other hand, if it’s understood that success (or failure) isn’t sustainable, then is it really fair to use that success in descriptions as “good” or “bad”?
Let’s look at this another way. Goodman says that conventional stats like RBI or goal-differential are “descriptive of what has happened.” On the surface, this is true. But if they don’t give an accurate description of why things have happened, then what what good does it do? In the end, it just adds to one of the bigger roadblocks to analytics acceptance that there is: belief that we understand the past, when we don’t.
Consider these quotes from two renowned authors and thinkers in the fields of psychology and statistics.
“The core of the illusion is that we believe we understand the past, which implies that the future also should be knowable, but in fact we understand the past less than we believe we do.” – Daniel Kahneman
“The illusion that we understand the past fosters overconfidence in our ability to predict the future.” – Nassim Taleb
It may seem harmless to say, “Albert Pujols had 4 RBI last week, so he was clearly at his best,” it is in fact misrepresenting not only the future, not only the player’s true talent, but also the past.
I wrote at the beginning of this post that I thought this was a concept that both hadn’t been completely fleshed out, and that I thought might not be for some time. That is because although I think this argument is persuasive, I’m not sure it’s the only one. As I wrote in the link I posted earlier on, I think unsustainable streaks of magic have their worth, and deserve to be celebrated. It’s unclear to me, though – and I believe still is deep down in the minds of even the most adamant analysts – how these two views can be reconciled.