The wrong metric
Although this blog is ostensibly about books, I’ve written a lot about sports, mostly dealing with how non-scientist readers perceive statistical analysis of athlete productivity. This issue fascinates me; I think how people think about sports statistics provides a microcosm in how they may respond to similar treatments in the scientific realm. Economists, mathematicians, engineers and physicists will provide a better explanation of the analysis than I can. Instead, I want to focus on the people who draw (shall we say) interesting conclusions about research.
In a recent podcast, Bill Simmons interviewed Buzz Bissinger on the BS Report (July 28, 2010). Bissinger gained some negative exposure as he had railed against the blogosphere and sports analysis. In this podcast, Bissinger was given some time to elaborate on his thoughts. He most certainly is not a raving lunatic, but he did say a few things that I find representative of how statistical analyses are often misinterpreted by non-scientists (and even scientists.)
Bissinger took the opportunity to trash Michael Lewis’s Moneyball, mostly by pointing out how Billy Beane isn’t so smart, and that all in the end, the statistical techniques didn’t work – only Kevin Youkilis – mentioned in the book, had proven to be a success. I think that misses the point. Yes, the book documents the tension between the scouts and the stat-heads. I think Lewis chose this approach to make the book more appealing, by taking the human interest angle, than simply writing a technical description of Beane’s “new” approach. Perhaps Lewis overstates the case in showing how entrenched baseball GMs were in relying on eyeball and qualitative skill assessments, but the point I got from the book was that: Beane worked under money constraints. He needed a competitive edge. Most baseball organizations relied on scouts. Beane thought that to be successful, he needed to do something different (but presumably had some relevance) to provide baseball success.
Beane could have used fortune tellers; I think the technique in Moneyball (i.e. statistical analysis) is besides the point. Beane found something that was different and based more of his decisions on this new evaluation method. This is a separate issue from how well the new techniques performed. the first issue is whether the new technique told him something different. As it happens (as documented in Moneyball, Bill James’s Baseball Abstracts, and by many sports writers and analysts), it did. The result is that Beane was able to leverage that difference – in this case, he valued some abilities that others did not – and signed those players to his roster. The assumption is that if his techniques couldn’t give him anything different from previous methods of evaluation, than he would have had nothing to exploit.
The second point is whether the techniques told him something that was correct. And again, the stats did provide him with a metric that has a high correlation with winning baseball games – the on-base percentage. So one thing he was able to exploit was the perception in value of batting average (BA) versus on-base percentage (OBP). He couldn’t sign power hitters: GMs – and fans – like home runs. He avoided signing hitters with high BA and instead signed those with high OBP.
This led to a third point: Beane can only leverage OBP to find cheap players (and still win) so long as there were few GMs doing the same. Of course the cost of OBP will increase if others come onboard and have deep pockets (like the Yankees and the Red Sox.) So Beane – and other GMs – would have to become more sophisticated in how they draft and sign players. Especially if they work under financial constraints. As my undergraduate advisor said, “You have to squeeze the data.”
One valid point point Bissinger made was that the success of the Oakland A’s coincided with the Big Three pitchers. So clearly, Bissinger wrote off a significant amount of Oakland success to the three. That’s fine, as the question can be settled by looking at data. What annoyed me is when readers do not pay attention to the argument. I just felt that Moneyball was more about how one can find success by examining what everyone else is doing, and then doing something different. The only constraint is whether something different would bring success.
I felt that Bissinger is projecting when he assumes that using stats means the rejection of visual experience. The importance of Moneyball is in demonstrating that one can find success by simply finding out what people have overlooked. Once the herd follows, it makes sense to seek out alternative measures, or, more likely, to find out what others are ignoring. If the current trend is on high OBP and ignoring pitchers with a high win-count, then a smart GM needs to exploit what is currently undervalued. Statistics happens to be one such tool – but it isn’t the only tool.
And part of the reason I write this is, again, to highlight the fact that people usually have unvoiced assumptions about the metrics they use. The frame of reference is important. In science, we explicitly create yardsticks for every experiment we perform. We assess things as whether they differ from control. It is a powerful concept. And even if the yardstick is simply another yardstick, we can still draw conclusions based on differences (or even similarities, if one derives the same answer by independent means.)
This brings me to recent Joe Posnanski and David Berri posts. The three posts I selected all demonstrate the internal yardsticks (hidden or otherwise) that people use when they make comparisons. I am a fan of these writers. I think Posnanski has provided a valuable service in bridging the gap between analysis and understanding, facts and knowledge. Whether one agrees or disagrees with his posts, I think Posnanski is extremely thoughtful and clear about his assumptions and conclusions, which facilicates discussion. The post has a simple point: Posnanski wrote about “seasons for the ages.” A number of readers immediately wrote to him, complaining about how just about anyone who hits 50 home runs in a season would qualify. To which Posnanski coined a new term (kind of like a sniglet) – obviopiphany.He realized that most people simply associate home runs with a fantastic season for a hitter. That isn’t what Posnanski meant, and in the post he offers some correction.
The Posnanski post has a simple theme and an interesting suggestion: the outrage over steroids may be due to the fact that people assume that home run hitters are good hitters. Since steroids help power, the assumption is that steroids make hitters good – which in most cases simply means more home runs. But Posnanski – and others sabermetricians – propose that one must hit home runs in the context of getting fewer strikeouts and more walks. The liability involved in striking out more, and not walking, is too much and washes out the gains made from hitting the ball far. Thus Posnanski posts names a 5 players who are not in the Hall of Fame, and aren’t home run hitters, but who nevertheless produced at the plate – according to some advanced hitting metrics. I won’t go into this more, except to say that here, Posnanski makes his assumptions clear. He uses OBP+, wins above replacement player, and other advanced metrics to make his point. But it is telling that Posnanski had to stitch together the assumptions his readers had – that the yardstick for good hitting simply boils down to home runs.
The Berri posts describe something similar. One of them is from a guest contributor, Ben Gulker, writing about how Rajon Rondo was not going to be selected for Team USA in the world championship because he doesn’t gather enough points. The other highlights how the perception of Bob McAdoo changed as a function of the fortunes of his team. Interestingly enough, McAdoo became a greater point getter while becoming a less efficient shooter and turning the ball over more; at the same time, his reputation was burnished by the championships his teams won.
The story has been told many times by Berri. It seems that in general, basketball writers and analysts associate good players as those who score points (in the literal sense, regardless of shooting percentage) and who played on championship teams. There are several problems here. Point getting must take place in the context of a high shooting percentage. One must not turn the ball over, one must rebound, one must not commit an above average number of fouls, and hopefully get a few steals and blocks. I don’t think anyone would disagree that such a player is a complete player and ought to be quite desirable, regardless of how many championship rings he has or if he scores only 12 points a game. Berri has examined this issue of yardsticks, and he has found that what sports writers, coaches, and GMs think of players has an extremely high correlation with, simply, how many points they get (this is shown by what the writers write and how they vote for player awards, how often coaches play someone, and how much GMs pay players.) The verbiage writing up about the defensive prowess and the “little things” are ignored when the awards are given and fat contracts handed out. Point getters get the most accolades and the most money.
And the other point is how easily point getters reflect the luster of championships. Nevermind that no player can win alone, but this again is an example of how people end up with not only unspoken yardsticks, but also choose a frame of reference without analyzing if it is the correct one. The reference point is a championship ring. As has been documented, championships are not good indicators of good teams. The regular season is. This is simply due to sample sizes. More games are played in the regular season. Teams are more likely to arrive at their “true” performance level than in a championship tourney with a variable number of games – and frankly where streaks matter. A good team might lose four games in a row, in the regular season, but they may lose only 10 for the year. In a tournament, they would be bounced out if they lose four in a series.
In this context, the Premier League system in soccer makes sense. The best teams compete in a regular season; the team with the best record is the champion. So people who assume that a point-getter who plays on a championship is better than a player who shoots efficiently (but with fewer points) and rebounds/steals/blocks/does not turnover above average, and on a non-champion team, make two errors. They selected the wrong metric twice over.
With that said, I could only have made that point because of newer metrics that provide another frame of reference. Moreover, the new metrics tend to have improved predictive abilities over simply looking at point-getting totals. Among the new metrics, there are some that show a higher correlation with the scoring difference (and thus win/loss record) of teams. It doesn’t matter what they are, but an important point is that one can derive these conclusions about which metric is better or worse.
This is the main difference in scientific (of which I include athlete productivity analysis) and lay discourse. In the former, the assumptions are made bare and frames discussion. A good scientific paper (and trust me, there are bad ones) makes excruciatingly detailed descriptions of controls, the points of comparisons, any algorithms/formulae, and how things are compared. In the lay discourse, this isn’t the standard one would use, because communicating scientific findings to other scientists use a stylized convention. Using such a mode of communication with friends would make one a bore and a pedant – not to mention one would become lonely real quick.