On whether descriptive, non-predictive stats really have use

As many of you know, I love reading about baseball and soccer analytics. I think that in baseball’s case, having the “first mover” element means that later-adapting sports like hockey can look to copy many of their concepts and ideas, even if the sports are quite different. Soccer, meanwhile, is quite a similar game to hockey – just a slower version – and therefore many of the more specific practices translate quite well.

Soccer is at an interesting point though because they, unlike baseball, are uncovering new statistics (courtesy of companies like Opta) while also just now figuring out which of those statistics are meaningful. You can get an idea of how far behind analytics in soccer is by the fact that its main predictive statistic: Total Shots Ratio – which is essentially the soccer version of corsi – is actually based on corsi.

Anyway, I was just reading this article by Mike L. Goodman (seriously, what would we do without Grantland these days?) and I was drawn to one particular paragraph, which discussed a concept that I think hasn’t completely been fleshed out – and may not be for some time – when it comes to descriptive vs. predictive statistics, and their merits. Here’s the passage:

“Arguing against the box score (and counting numbers) argues against using statistics for descriptive purposes. While traditional baseball stats are not particularly predictive of what will happen, they are very descriptive of what has happened. Try describing how the Baltimore Orioles played over the last week without using statistics, or try explaining how good Clayton Kershaw is. Stats that are increasingly becoming discredited in baseball don’t fail to describe how good or bad performance has been; they fail to explain the whys of that performance and, consequently, whether or not they would continue.” (Bolding my own)

Goodman claims that statistics that are descriptive but not predictive still have value in answering questions like “how well has team X played over the past week” or “how good is X player”. But is that really the case?

Let’s take an example that hits close to home for hockey fans. PDO. It doesn’t stand for anything but I like to think of it as “Percentage-Driven Output”. It is shooting percentage + save percentage, and if that number is significantly higher than 100, you’ve probably been lucky, and vice versa. Why? Because there has been no proof that in most cases maintaining that is sustainable. Goodman, based on this article, would say that a number like goal differential, which incorporates both fairly sustainable (shot differentials) and not very sustainable (shooting success) elements, is useful in determining which teams are good or have been good, even if they don’t predict which teams will be good in the near future.

But does that really make sense? On the one hand, we want to recognize past accomplishments, whether or not they are repeatable. Think Justin Williams and his game 7 prowess. But on the other hand, if it’s understood that success (or failure) isn’t sustainable, then is it really fair to use that success in descriptions as “good” or “bad”?

Let’s look at this another way. Goodman says that conventional stats like RBI or goal-differential are “descriptive of what has happened.” On the surface, this is true. But if they don’t give  an accurate description of why things have happened, then what what good does it do? In the end, it just adds to one of the bigger roadblocks to analytics acceptance that there is: belief that we understand the past, when we don’t.

Consider these quotes from two renowned authors and thinkers in the fields of psychology and statistics.

“The core of the illusion is that we believe we understand the past, which implies that the future also should be knowable, but in fact we understand the past less than we believe we do.” – Daniel Kahneman

“The illusion that we understand the past fosters overconfidence in our ability to predict the future.” – Nassim Taleb

It may seem harmless to say, “Albert Pujols had 4 RBI last week, so he was clearly at his best,” it is in fact misrepresenting not only the future, not only the player’s true talent, but also the past.

I wrote at the beginning of this post that I thought this was a concept that both hadn’t been completely fleshed out, and that I thought might not be for some time. That is because although I think this argument is persuasive, I’m not sure it’s the only one. As I wrote in the link I posted earlier on, I think unsustainable streaks of magic have their worth, and deserve to be celebrated. It’s unclear to me, though – and I believe still is deep down in the minds of even the most adamant analysts – how these two views can be reconciled.




2 thoughts on “On whether descriptive, non-predictive stats really have use

  1. I tried to comment on this last night from my phone, but I don’t think it registered. Apologies if it’s duplicated. I think descriptive descriptions of past events do have value, as a “shorthand.” It’s only when we ascribe predictive value to them that we go astray. Perhaps it is wrong to say, “Albert Pujols had 4 RBI last week, so he was clearly at his best” but it is not wrong to say “Albert Pujols went 4/4, so he had a good game.” Similarly, if we use Halak’s 2010 playoff save percentage (or whatever statistic) to summarize his great run, to me it’s not wrong UNLESS you use it to back a contention that Halak is therefore a great goalie.

    You have to remember that the performance of any athlete is a combination of good and bad plays, good and bad decisions, and random chance. Sidney Crosby can make a bad decision or a bad play, he can give the puck away and have it lead directly to a goal against. Douglas Murray can make a great defensive play (theoretically…). If analytics tells us that Sidney Crosby is great and Murray dreadful, that does not change the fact that they are capable of the opposite. It is not logical to ignore the components that make up our judgement of the player, or selectively ignore those that are not consistent with it. We have to see and describe all. The trick is in ascribing the correct weight to the evidence. If Murray makes a key defensive play that saves a sure goal, do you think Habs fans should boo it or sit on their hands because Murray is bad? Of course not. And if Murray had a run of good play (however improbable), shouldn’t we recognize it just as we recognize the bad play?


  2. Arik,

    I know you’re a pretty big tennis fan, so let’s talk about that for a second.

    One of the best sports psychology books I’ve read as a teenager was called “Tennis: la preparation mentale” by Antoni Girod (http://www.amazon.fr/Tennis-pr%C3%A9paration-mentale-Antoni-Girod/dp/291829201X).

    It’s not a stats-oriented book by any means, but the author brings up an interesting point in the introduction, which I’ll paraphrase: Pete Sampras (or Nadal, or Djokovic, or Federer) is NOT the number 1 or number 2 tennis player in the world. Rather, Sampras (or anyone else) HAS a ranking of #1, #2, etc. To play great tennis, Girod believes that you need to dissociate yourself from numbers (whether your ranking or the scoreline of your last match) and look deeper to find the player you are. An example: my buddy Filip Peliwo was the #1 ranked junior player in the world back in 2012 before going pro, cracked the top 250 last year on the ATP tour and is now ranked 342 in the world. It is descriptive in a sense, but is it of any use in predicting the future? Time will tell.

    In any case, from an athlete’s point of view, it is definitely dangerous to base your mental schema (or worse, your general self-esteem) on what your ranking is, or how many goals you scored last season. It’s not really who you are.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s