Archive

Tag Archives: Bill Simmons

Megan McArdle’s The Up Side of Down is good survey of literature about the science of failing, resilience, and success. Books of this sort, written for popular consumption, generally suffers from the three ring binder effect; it’s a loose collection of research and interviews, organized by themes. In some cases, the research has been presented in other contexts, both by the researchers themselves (Daniel Gilbert and Jonathan Haidt) and by other popularizers of behaviorial science.

Luckily, Ms. McArdle’s approach is disarming and charmingly self-deprecating. Her binder, as it were, ties together her own failures to the research she presents. Her failure to find a job, her inability to move past a relationship, and her experience combating 9/11 Truthers provide a human face to the statistics of neuropsychology research. Most importantly, she demonstrates the power inherent in recognizing when a path is failing and taking action to shut it down. Loss aversion supplies a  motive in maintaining status quo, and variations on this theme are explored.

As with such popular science books, there is a hint of the prescriptive in her book. Ms. McArdle supports a more generous approach to mistakes and wishes that political forces would stop moving towards harsher punishment for any mistakes.

Despite the compelling theme, and one that I tend to agree with it, I find these books shallow. To Ms. McArdle’s credit, I would absolutely love for her to expand on just about every chapter. As it is, she combines general lessons learned from both investigators and from her life. It is effective. Take it for what you will; if you want more, follow up on her bibliography. The book is compelling.

I found useful lessons, especially with emphasizing the need to give kids a safe place to fail. Ever since I became aware of research regarding the contradictory effect of praising intelligence rather than effort (actually, pointing out anything aside from effort), I’ve focused on the process. (There’s actually new research suggesting that merely visualizing directions – up versus down, flying versus digging – might affect cognitive tasks due to the emotionality of the visualization.)  It’s actually nicer and easier in some ways, because it gives adults cues to talk about specific things about the child’s project (“Oooh! I like how you did the trees and arranged them according to perspective!”).

Ms. McArdle’s book reminds us that it is not only OK, but necessary to identify faults. Especially when younger and with lower stakes; the kids can immediately see where they went wrong and they can correct it. The key is to be gentle enough to call attention to the mistake but not dwell on it. Make it feel like a bump; comment and move on.

Although I wish Ms. McArdle spent more time on developing the idea and presenting more research, I agree with her that the ability to remain calm and not focus on the emotional sting of shame and feelings of failing is absolutely crucial to moving on. Perhaps becoming accustomed to the iterative process of failing/identify/improve will help desensitize kids to the emotional turmoil of being wrong so they eventually focus on the substance of criticisms.

I happen to think there’s a lot to learn from Ms. McArdle’s book, and I can draw many parallels to the process of science. My colleagues and I have joked that we are in an asymmetric relationship: the science has all the power. We work, but our feedback is generally negative. Our advisors and supervisors simply give comments for improvement (ask anyone about the process of writing a grant or manuscript), only to receive more feedback upon submission – the paper is rejected/won’t fit our journal. If accepted provisionally, we will get more feedback from reviewers. Grants also get scored and we receive comments.

But we all understand this is the process. The worse comment for a grant is no comment at all. The grant being so bad that it was not worth the reviewer’s time to improve on it.

And of course, a lot of our time is spent on dealing with no or opposite results: no change where change is expected. Change were stasis is expected. The effect is too small or opposite what you predicted. And things break and stop working all the time. A lot of these errors come down the the experiments and analysis (perhaps an incorrect baselining or normalization.)

But when experiments start pulling together and a paper is eventually accepted, it is exactly like the first sunlight after an arctic winter. The rest of the time, it’s that arctic darkness.

Sorry; do I sound bitter?

I’m sure authors/writers/reporters all have analogous stories. The point is that success is more about attrition and self-selection. The people who thrive and have careers all continue to produce and deal with failures as if they are minor. They integrate criticism, iterate, and improve. So yes, I pretty much buy into Ms. McArdle’s thesis.

One thing I like about the book is that she tackles the issue of normative errors and accidents. The distinction is important to make, even if the definitions are not necessarily clear cut. Accidents are events that occur and couldn’t really be accounted for in the planning and execution. The operative word is could. Many things can and do happen, but the definition of those accidents happening is that it is coincidental, with the unfortunate victim falling prey to a low probability event.

Normative errors arise during process and execution, due to missed steps. The word here is should. Generally, there are a few things that should have been done, but weren’t. The two seem separated by degree; I suppose if you find yourself linking a series of events – if only I had walked a few steps quicker or slower, I would have turned the corner and seen the the guys backing out with the large pane of glass instead of walking into the glass – this probably is an accident.

A mistake can probably be traced to something one did or didn’t do, and a compounded mistake just means many people failed down the line. I can see how some readers might want clearer explanations.

But the point of the book is not explicitly about mistakes, but how we recover from them.

Ms. McArdle put together a rather compelling book. She connects threads in research on attention, motivation, and economics and drew new observations. I especially liked her chapter on tunnel vision (“inattentional blindness”). She starts with the description of Daniel Simons’s and Christopher Chabris’s experiment with having students score the number of times a basketball team, in a video. Afterwards, they ask the students about the number of passes – and whether they saw a gorilla mascot run threw the middle of the court, between the players. She seques into an analysis on the Dan Rather/President G.W. Bush National Guard story that cost Mr. Rather his job. Dan Rather made the mistake of defending his decision, rather than simply working to figure out whether something went wrong.

There were apparently a whole chain of mistakes, but the point is that there is power to simply acknowledge he could have been at fault. The proper play would be along the lines of Ira Glass’s signing off on Mike Daisey’s Apple story, where Mr. Glass admitted he was wrong and then spent a subsequent hour on analyzing the mistakes he and his team made – while rectifying the original story. A hot-of-the-press example is in how Bill Simmons dealt with the Dr. V’s Magical Putter story.

I do hope people read Ms. McArdle’s book. I think she has a talent for providing proper context and tackling the best and most relevant arguments between opposing views (see her chapters on bankruptcy, welfare reform, and moral hazard.) For the short length of time reading the book, I think readers will gain an immeasurable sense of well-being as they learn to love mistakes.

Joe Posnanski has written another thoughtful piece on the divide between writers of a statistical bent and those who prefer the evidence of their eyes.  I highly recommend it; Posnanski distills the arguments into one about stories. Do statistics ruin them? His answer is no. Obviously, one should use statistics to tell other stories, if not necessarily better ones. He approached this by examining how one statistic, “Win Probability Added”, helped him look at certain games with fresh eyes.

My only comment here is that, I’ve noticed on his and other sites (such as Dave Berri’s Wages of Wins Journal) that one difficulty in getting non-statisticians to look at numbers is that they tend to desire certainty. What they usually get from statisticians, economists, and scientists are reams of ambiguity. The problem comes not when someone is able to label Michael Jordan as the greatest player of all time*; the problem comes when one is left trying to place merely great players against each other.

* Interestingly enough, it turns out the post I linked to was one where Prof. Dave Berri was defending himself against a misperception. It seems writers such as Matthew Yglesias and King Kaufman had mistook Prof. Berri’s argument using his Wins Produced and WP48 statistics, thinking  that Prof. Berri wrote other players were “more productive” than Jordan. To which Prof. Berri replied, “Did not”, but also gave some nuanced approaches in how one might look at statistics. In summary, Prof. Berri focused on the difference in performance of Jordan above that of his contemporary peers. 

The article I linked to about Michael Jordan shows that, when one compares numbers directly, care should be taken to place them into context. For example, Prof. Berri writes that, in the book Wages of Wins, he devoted a chapter to “The Jordan Legend.” at one point, though, he writes that

 in 1995-96 … Jordan produced nearly 25 wins. This lofty total was eclipsed by David Robinson, a center for the San Antonio Spurs who produced 28 victories.

When we examine how many standard deviations each player is above the average at his position, we have evidence that Jordan had the better season. Robinson’s WP48 of 0.449 was 2.6 standard deviations above the average center. Jordan posted a WP48 of 0.386, but given that shooting guards have a relatively small variation in performance, MJ was actually 3.2 standard deviations better than the average player at his position. When we take into account the realities of NBA production, Jordan’s performance at guard is all the more incredible.

If one simply looked at the numbers, it does seem like a conclusive argument that Robinson, having produced more “wins” than Jordan, should be the better player. The nuance comes when Prof. Berri places that into context. Centers, working closer to the basket, ought to have more, high-percentage shooting opportunities, rebounds, and blocks. His metric of choice, WP48, takes these into consideration. When one then looks at how well Robinson performed above his proper comparison group (i.e. other centers), we see that Robinson’s exceptional performance is something one should expect when comparing against other positions but is not beyond the pale when compared to other centers. However, Jordan’s performance, when compared to other guards, shows him to be in a league of his own.

That argument was accomplished by taking absolute numbers (generated for all NBA players, for all positions) and placing them into context (comparing to a specific set of averages, such as by position.)

This is where logic, math, and intuition can get you. I don’t think most people would have trouble understanding how Prof. Berri constructed his arguments. He tells you where his numbers came from, why there might be issues and going against “conventional wisdom”, and in this case, the way he structured his analysis resolved this difference (it isn’t always the case he’ll confirm conventional wisdom – see his discussions on Kobe Bryant.)

However, I would like to focus on the fact that Prof. Berri’s difficulties came when his statistics generated larger numbers for players not named Michael Jordan. (I will refer people to a recent post listing a top-50 of NBA players on Wages of Win Journal.*)

* May increase blood pressure.

In most people’s minds, that clearly leads to a contradiction: how can this guy, with smaller numbers, be better than the other guy? Another way of putting this is: differences in numbers always matter, and they matter in the way “intuition” tells us.

In this context, it is understandable why people give such significance to 0.300 over 0.298. One is larger than the other, and it’s a round number to boot. Over 500 at-bats, the difference between a 300-hitter and a .298-hitter  translates to 1 hit. For most people who work with numbers, such a difference is non-existent. However, if one were to perform “rare-event” screening, such as for cells in the blood stream that were marked with a probe that “lights” up for cancer cells, then a difference of 1 or 2 might matter. In this case, the context is that, over a million cells, one might expect to see, by chance, 5 or so false-positives in a person without cancer. However, in a person with cancer, that number may jump to 8 or 10.

For another example: try Bill Simmons’s ranking of the top 100 basketball players in his book, The Book of Basketball. Frankly, a lot of the descriptions, justifications, arguments, and yes, statistics that Simmons cites looks similar. However, my point here is that, in his mind, Simmons’s ranking scheme matters.  The 11th best player of all time lost something by not being in the top-10, but you are still better off than the 12th best player. Again, as someone who works with numbers, I think it might make a bit more sense to just class players into cohorts. The interpretation here is that, at some level, any group of 5 (or even 10)  players ranked near one another are practically interchangeable in terms of their practicing their craft. The differences between two teams of such players is only good for people forced to make predictions, like sportswriters and bettors. With that said, if one is playing GM, it is absolutely a valid criterion to put a team of these best players together based on some aesthetic consideration. It’s just as valid to simply go down a list and pick the top-5 players as ordered by some statistic.* If two people pick their teams in a similar fashion, then it is likely a crap shoot as to which will be the better team in any one-off series. Over time (like an 82-game season), such differences may become magnified. Even then, the win difference between the two team may be 2 or 3.

* Although some statistics are better at accounting for variance than others.

How this leads back to Posnanski is as follows. In a lot of cases, he does not just simply rank numbers; partly, he’s a writer and story teller. The numbers are not the point; the numbers illustrate. Visually, there isn’t always a glaring difference between them, especially when one looks at the top performances.

Most often, the tie-breaker comes down to the story, or, rather, what Posnanski wishes to demonstrate. He’ll find other reasons to value them. In the Posnanski post I mentioned, I don’t think the piece would make a good story, even if it highlighted his argument well, had it ended differently.

Although this blog is ostensibly about books, I’ve written a lot about sports, mostly dealing with how non-scientist readers perceive statistical analysis of athlete productivity. This issue fascinates me; I think how people think about sports statistics provides a microcosm in how they may respond to similar treatments in the scientific realm. Economists, mathematicians, engineers and physicists will provide a better explanation of the analysis than I can. Instead, I want to focus on the people who draw (shall we say) interesting conclusions about research.

In a recent podcast, Bill Simmons interviewed Buzz Bissinger on the BS Report (July 28, 2010). Bissinger gained some negative exposure as he had railed against the blogosphere and sports analysis. In this podcast, Bissinger was given some time to elaborate on his thoughts. He most certainly is not a raving lunatic, but he did say a few things that I find representative of how statistical analyses are often misinterpreted by non-scientists (and  even scientists.)

Bissinger took the opportunity to trash Michael Lewis’s Moneyball, mostly by pointing out how Billy Beane isn’t so smart, and that all in the end, the statistical techniques didn’t work – only Kevin Youkilis – mentioned in the book, had proven to be a success. I think that misses the point. Yes, the book documents the tension between the scouts and the stat-heads. I think Lewis chose this approach to make the book more appealing, by taking the human interest angle, than simply writing a technical description of Beane’s “new” approach. Perhaps Lewis overstates the case in showing how entrenched baseball GMs were in relying on eyeball and qualitative skill assessments, but the point I got from the book was that: Beane worked under money constraints. He needed a competitive edge. Most baseball organizations relied on scouts. Beane thought that to be successful, he needed to do something different (but presumably had some relevance) to provide baseball success.

Beane could have used fortune tellers; I think the technique in Moneyball (i.e. statistical analysis) is besides the point. Beane found something that was different and based more of his decisions on this new evaluation method. This is a separate issue from how well the new techniques performed. the first issue is whether the new technique told him something different. As it happens (as documented in Moneyball,  Bill James’s Baseball Abstracts, and by many sports writers and analysts), it did. The result is that Beane was able to leverage that difference – in this case, he valued some abilities that others did not – and signed those players to his roster. The assumption is that if his techniques couldn’t give him anything different from previous methods of evaluation, than he would have had nothing to exploit.

The second point is whether the techniques told him something that was correct. And again, the stats did provide him with a metric that has a high correlation with winning baseball games – the on-base percentage. So one thing he was able to exploit was the perception in value of batting average (BA) versus on-base percentage (OBP). He couldn’t sign power hitters: GMs – and fans – like home runs. He avoided signing hitters with high BA and instead signed those with high OBP.

This led to a third point: Beane can only leverage OBP to find cheap players (and still win) so long as there were few GMs doing the same. Of course the cost of OBP will increase if others come onboard and have deep pockets (like the Yankees and the Red Sox.) So Beane – and other GMs – would have to become more sophisticated in how they draft and sign players. Especially if they work under financial constraints. As my undergraduate advisor said, “You have to squeeze the data.”

One valid point point Bissinger made was that the success of the Oakland A’s coincided with the Big Three pitchers. So clearly, Bissinger wrote off a significant amount of  Oakland success to the three. That’s fine, as the question can be settled by looking at data. What annoyed me is when readers do not pay attention to the argument. I just felt that Moneyball was more about how one can find success by examining what everyone else is doing, and then doing something different. The only constraint is whether  something different would bring success.

I felt that Bissinger is projecting when he assumes that using stats means the rejection of visual experience. The importance of Moneyball is in demonstrating that one can find success by simply finding out what people have overlooked. Once the herd follows, it makes sense to seek out alternative measures, or, more likely, to find out what others are ignoring. If the current trend is on high OBP and ignoring pitchers with a high win-count, then a smart GM needs to exploit what is currently undervalued. Statistics happens to be one such tool – but it isn’t the only tool.

And part of the reason I write this is, again, to highlight the fact that people usually have unvoiced assumptions about the metrics they use. The frame of reference is important. In science, we explicitly create yardsticks for every experiment we perform. We assess things as whether they differ from control. It is a powerful concept. And even if the yardstick is simply another yardstick, we can still draw conclusions based on differences (or even similarities, if one derives the same answer by independent means.)

This brings me to recent Joe Posnanski and David Berri posts. The three posts I selected all demonstrate  the internal yardsticks (hidden or otherwise) that people use when they make comparisons. I am a fan of these writers. I think Posnanski has provided a valuable service in bridging the gap between analysis and understanding, facts and knowledge. Whether one agrees or disagrees with his posts, I think Posnanski is extremely thoughtful and clear about his assumptions and conclusions, which facilicates discussion.  The post has a simple point: Posnanski wrote about “seasons for the ages.” A number of readers immediately wrote to him, complaining about how just about anyone who hits 50 home runs in a season would qualify. To which Posnanski coined a new term (kind of like a sniglet) – obviopiphany.He realized that most people simply associate home runs with a fantastic season for a hitter. That isn’t what Posnanski meant, and in the post he offers some correction.

The Posnanski post has a simple theme and an interesting suggestion: the outrage over steroids may be due to the fact that people assume that home run hitters are good hitters. Since steroids help power, the assumption is that steroids make hitters good – which in most cases simply means more home runs. But Posnanski – and others sabermetricians – propose that one must hit home runs in the context of getting fewer strikeouts and more walks. The liability involved in striking out more, and not walking, is too much and washes out the gains made from hitting the ball far. Thus Posnanski posts names a 5 players who are not in the Hall of Fame, and aren’t home run hitters, but who nevertheless produced at the plate – according to some advanced hitting metrics. I won’t go into this more, except to say that here, Posnanski makes his assumptions clear. He uses OBP+, wins above replacement player, and other advanced metrics to make his point. But it is telling that Posnanski had to stitch together the assumptions his readers had – that the yardstick for good hitting simply boils down to home runs.

The Berri posts describe something similar. One of them is from a guest contributor, Ben Gulker, writing about how Rajon Rondo was not going to be selected for Team USA in the world championship because he doesn’t gather enough points. The other highlights how the perception of Bob McAdoo  changed as a function of the fortunes of his team. Interestingly enough, McAdoo became a greater point getter while becoming a less efficient shooter and turning the ball over more; at the same time, his reputation was burnished by the championships his teams won.

The story has been told many times by Berri. It seems that in general, basketball writers and analysts associate good players as those who score points (in the literal sense, regardless of shooting percentage) and who played on championship teams. There are several problems here. Point getting must take place in the context of a high shooting percentage. One must not turn the ball over, one must rebound, one must not commit an above average number of fouls, and hopefully get a few steals and blocks. I don’t think anyone would disagree that such a player is a complete player and ought to be quite desirable, regardless of how many championship rings he has or if he scores only 12 points a game. Berri has examined this issue of yardsticks, and he has found that what sports writers, coaches, and GMs think of players has an extremely high correlation with, simply, how many points they get (this is shown by what the writers write and how they vote for player awards, how often coaches play someone, and how much GMs pay players.)  The verbiage writing up about the defensive prowess and the “little things” are ignored when the awards are given and fat contracts handed out. Point getters get the most accolades and the most money.

And the other point is how easily point getters reflect the luster of championships. Nevermind that no player can win alone, but this again is an example of how people end up with not only unspoken yardsticks, but also choose a frame of reference without analyzing if it is the correct one. The reference point is a championship ring. As has been documented, championships are not good indicators of good teams. The regular season is. This is simply due to sample sizes. More games are played in the regular season. Teams are more likely to arrive at their “true” performance level than in a championship tourney with a variable number of games – and frankly where streaks matter. A good team might lose four games in a row, in the regular season, but they may lose only 10 for the year. In a tournament, they would be bounced out if they lose four in a series.

In this context, the Premier League system in soccer makes sense. The best teams compete in a regular season; the team with the best record is the champion. So people who assume that a point-getter who plays on a championship is better than a player who shoots efficiently (but with fewer points) and rebounds/steals/blocks/does not turnover above average, and on a non-champion team, make two errors. They selected the wrong metric twice over.

With that said, I could only have made that point because of newer metrics that provide another frame of reference. Moreover, the new metrics tend to have improved predictive abilities over simply looking at point-getting totals. Among the new metrics, there are some that show a higher correlation with the scoring difference (and thus win/loss record) of teams. It doesn’t matter what they are, but an important point is that one can derive these conclusions about which metric is better or worse.

This is the main difference in scientific  (of which I include athlete productivity analysis) and lay discourse. In the former, the assumptions are made bare and frames discussion. A good scientific paper (and trust me, there are bad ones) makes excruciatingly detailed descriptions of controls, the points of comparisons, any algorithms/formulae, and how things are compared. In the lay discourse, this isn’t the standard one would use, because communicating scientific findings to other scientists use a stylized convention. Using such a mode of communication with friends would make one a bore and a pedant – not to mention one would become lonely real quick.

I read Bill Simmons’s The Book of Basketball. I enjoyed his book, as it is a fun survey of NBA history. The book isn’t just a numbers game or just breaking down plays. It includes enough human interest elements that it should appeal to a casual fan or diffident parties (like me; I can count the number of basketball games I’ve seen – TV or live – on both hands.) Simmons does a fantastic job of conveying his love of basketball. For me, he really brought different basketball eras to life, inserting comments from players, coaches, and sportswriters. He also seems fairly astute in breaking down plays and describing the flow of the game.

Yes, I bought the book because I think Bill Simmons’s writing. If you enjoy his blog, you will find that same breezy conversation style here. The man has a gift for dropping pop culture references and making it germane to his arguments. But what I like most is that he is earnest in trying to understand and to make his readers appreciate the people who play a game for a living.

His segment on Elgin Baylor was moving, in showing how racism affected this one man; in some ways, it was probably more effective than if he just talked in general terms about the 1960’s. His whole book works because it stays at the personal level. Even in his discussion of teams and individual players, he takes pains to discuss how this person was and is regarded by his peers and teammates.

In this way,  I think Simmons did a fantastic job of making a case that basketball can contain as much historical perspective as baseball. This is something that should not have to be argued. Baseball has a lock on “the generational game by which history can be measured” status. What seems important is that there are human elements that make it accessible between generations: things like fathers taking their sons to the games, talking about the games and players, the excitement of watching breathtaking physical acts that expand how one views the human condition, and the joy and agony of championship wins and losses. While baseball’s slow pace lends itself to the way history moves one (periods where nothing seems to happen punctuated by drama), it doesn’t mean other things happen in a vacuum. Style of play, the way the players are treated, and the composition of the player demographic all reflect the times. These games can be a reflection of society, and one can see the influence of racial injustice in something as mundane as box scores as integration occurred.

Simmons blend basketball performance, its history, and its social environment of basketball effectively, some examples could be found in his discussion of Dr. J, Russell, Baylor, Kareem, and Jordan. In discussing why there probably won’t be another Michael Jordan (or Hakeem, or Kevin McHale), he takes inventive routes. Most of his points relate to societal/basketball environment pressures. Players are drafted sooner, the high pay scale for draft picks lower motivation to prove their worth, and perhaps society itself would actively discourage players from behaving as competitively as Jordan did. I suppose it’s interesting, but I’m not sure if that matters so much if the player is perceived to be an excellent player. Regardless, it seems to me that Simmons has been thinking about these things for some time. And I found it fun to read his take on basketball.

And I liked this book because it gives the lie to the weird view that someone who hasn’t done something cannot make reasonable, intelligent statements about it. Simmons wasn’t a professional basketball player, but he certainly uses every resource available to absorb the history and characters populating the game. He read a fair bit, he watched and rewatched games, he talked to players, he talked to people who covered basketball and he watched some more.  And he isn’t afraid to raise issues that occur to readers; you’ll see what I mean when you read his footnotes.

The book (and his podcast) confirms my opinion of Simmons as the smart friend who’d be a blast to have (one who bleeds Celtics green, watches sports for a living, and must keep up with Hollywood gossip, gambles, and pop culture because it gives him ammunition for columns).

***

There are some issues with the book, mainly in how statistical analysis of basketball is portrayed. I should be upfront and say that these issues did not detract from his arguments (for reasons that will be clear later), but I wish he would reconcile eyeball and statistical information.  And because I’ve decided one focus of this blog should be how non-scientists deal with science (and scientists), I thought I should offer some thoughts on some of these issues.

I am somewhat undecided about how Simmons (and I suppose I am using him as a proxy for all “non-scientist”) actually feels about statistics. He claims that team sports like basketball and football are fundamentally different from baseball; the team component of the former increase the number of additive and subtractive interactions while the latter game is composed of individual units of performance.  Thus the increase in complexity makes it difficult to model. So he discards so called simple measures of NBA player performance like WP48, PER, and adjusted plus-minus.

His rationale is that these indicators ought to back up existing observations about NBA players. So Kobe Bryant needs to be ranked as a top-20 player of all time (WP48 ranks Bryant as a superior player – like Paul Pierce – and not a step or two behind Michael Jordan.) It seems like he wants statistics to tell him what he wants to hear, when in fact statistics helps you see things you don’t see.

But then that leads to my second point about Simmons: why does he need the model to back up his mental model of player performance? Put differently, why is it that he cannot accept differences in rankings calculated by some turn-the-crank-spit-out-value model? I think Simmons lacks a nuanced view of how these numbers ought to be interpreted, and that he refuses to see that a simple model can capture a great many things about a complex system. Sure, once you’ve set up your criteria (like some level of significance you are willing to accept), you align everything by it, but there is room for some judgement as to where that line is drawn.

Another way of describing a complex system is to say that there are many things going on at once, and they are all interacting in some way. There are 10 players on a basketball court. One player, with the ball, has options to pass, to shoot, or to move the ball. Within each of these options, he has a set of suboptions: which one of the other four guys do I pass to? Who’s open? Which open player has a good shot from where he is? Am I in my optimal position to shoot? Do I need to drive to the basket or kick the ball out to the perimenter? There are many more possibilities than these.

***

At one level, Simmons is right; it is useful to break things down into “hyperintelligent” stats – identifying the tendency of players (whether he likes breaking to his left or right when he’s starts driving from the top of the key, whether he is equally good in shooting from his left or right hand, how often he does a turnaround, fadeaway, or drives to the hoop), trying to figure out how many forced errors a defender creates, how often a unforced turnovers happen (like someone dribbling off his foot), how many blocks get slapped out of bounds vs being tipped to get possession, and so on.

But isn’t it just as intelligent to find an easy way of collapsing the complex game into a simple “x + y” formula? On several occasions, Simmons uses a short quote (and praises the person who said it) that captures everything he wanted to say in 15 pages. A simple model is analogous to that short quote.

More importantly, what if we didn’t need all these hyperintelligent stats to capture the essence of the game?

I just switched the problem from one of identifying player performance and productivity to one that captures the game a broad strokes. The two ideas are of course related but still distinct and should not be confused to mean the same thing.

This gets back to the original motives of the person who does the modeling.

If it’s a scientist or economist, I’ll tell you now that he is interested in getting the most impact with the least amount of work. He probably has to teach, run a lab/research program, and write grants and publications. He doesn’t have time to break game film down. And he certainly does not have the money to hire someone to look at game film (although I am sure he’ll have no lack of applicants for the job.) He spends his money finding people to do research and teach. If his research program is into finding ways to measure worker productivity, he will probably start with existing resources. So fine; he now has a database of NBA player box scores.

He’ll want to link these simple measures of player output to wins and losses. But players score points, not wins, and thankfully the difference in points scored and points given up correlate extremely well with wins and losses.

From there, it is relatively simple to do a linear regression for all players for all teams, finding how each of the box score stats relate to the overall points scored for each team. And as noted, some metrics have a higher correlation to the point difference (I will not use the term differential to mean difference; differential belongs to diff EQ’s.) Regardless, it seems an affliction for males that they rank things; so the researchers have these numbers, and it’s trivial to list players from high to low.

Now, here’s another consideration. In this, and in other branches of science, the data are not “clean”. That is, we scientists (generally) assume that the phenomenon we are observing conforms to a “normal” distribution – that is, there is some true state for the thing we observe (found by taking the average of our observations) and the individual pieces of observation hover around this true state (or average). So there is variation around the mean.

In my research, for example, I can measure neural responses in the olfactory bulb. I use optical indicators of neural activity; essentially, the olfactory bulb lights up with odor stimulation. The more the neurons respond, the brighter things get. The olfactory bulb is separated into these circular structures called glomeruli. Each glomerulus receives connections from the sensory neurons situated in the nose and the output neurons of the olfactory bulb (some other cells are also present, but they aren’t important for this story.)

When a smell is detected by humans (or animals and insects), what we mean is that some chemical from the odor source has been carried, through the air, into the nose and neurons become active (they fire “action potential spikes”). And the pattern of this activity, at the olfactory bulb, is quite similar – but not exactly the same – from animal to animal.

Sometimes, we see fewer responses to the same smell. Other times, we see a few more responses. Sometimes we see a different pattern from what we expect. Sometimes, we see no responses. This might happen once every 15 animals. Not a whole lot to take away from our general, broad stroke understanding of how this part of the brain processes smell information. In most cases, some of these things might be explained technically; the animal was in poor health, or our stimulus apparatus has a leak, or the smell compound is degraded. We know this because we can improve the signal by fixing the equipment or giving the animal a drug to clear up its nose (mucus secretion – snot! – is a problem).

And as a direct analogy to this WP48 vs “hyperintelligent stats” problem, we find that a complex smell (compose of hundreds of different chemicals) may be “recreated” by using a few of these chemicals. There is good empirical evidence this is the case: prepared food manufacturers and fragrance makers can mimick smells and flavor reasonably well. This is akin to capturing the essence of the smell (or sport) with a few simple chemicals (or box scores). And generally, we don’t even need people to describe to us what they smell to figure this out (i.e. break down game film to create detailed stats). We can simply force them to make them answer a simple question: do these two things smell the same to you, yes or no? Thus “complex” brain processes and decision making can be boiled down into a forced-choice test results. Do we lose information? Yes, but everyone realizes this is a start. As we know more, and new technology becomes available, we can do more and ask more with less effort. Then we will be able to better use the information we have. As far as I know, most statheads have access to box-scores (although there is nothing to stop them from breaking down game film aside from time and money issues.)

But that’s the broad strokes view. If we get into details (that is, as if we started working with the “hyperintelligent” stat breakdowns), we find that of course there is more going on, and that the differences we see are not only technical issues. For example, the pattern of activity we see differs slightly from animal to animal, but this is because the cells that form connections with the olfactory bulb do not hit the same spot. And if we can use a single chemical to recreate a smell, the smell itself is still different enough that humans generally can tell something is missing. So the other chemicals are in fact detected and contributing some information that the brain uses to form the sensation of smell. And we know that the way neurons respond to a single chemical differs from how they respond to a mixture, confirming that there is in fact additional information being transmitted.

The important point is that the simple model captures an important part, but not all, of the complex system. One problem that can occur with increasing the complexity of models is that overfitting occurs: the model becomes applicable to one small part, rather than the whole, system. Even game film breakdown hinders  if it gives you so many options that you are back where you started. You’d probably avoid focusing on rare events and just concentrate on the things that happen often – which, again, is the point of a simple model.

The intense break down of game film to provide detailed portraits of player effectiveness could be combined with the broad strokes analysis. A metric like WP48 can tell a coach where a player is deficient. The coach can use the detailed breakdown to figure out why the player isn’t rebounding, passing, shooting well, and so on. That’s where things like defensive pressure, help defense, and positional analysis can be used for further evaluation. And I’m not sure if stat heads argued otherwise.

Deficiencies of statistical models

As in the things that models explicitly ignores.

One thing statistical models do not address is the fan’s enjoyment of a player. Actually, I suppose one might be able simply chart percent-capacity of stadiums when a particular player comes to town, but that’s something I don’t think Simmons would argue. There’s something to be said about how a player scores: Simmons pays tribute to Russell and Baylor, the first players to make basketball a vertical game. He cites Dr. J. as introducing the urban playground style  into basketball. He loves talking about the egos of players, especially when players take MVP snubs personally and then dominates the so-called MVP in a subsequent game.

Simmons also offers a rebuttal to PER, adjusted plus/minus, and “wages of win” metrics in his ranking of Allen Iverson – by saying that he doesn’t care. It’s sufficient for him that he finds Iverson a presence on the court. His emotions are acted out as basketball plays. He finds Iverson’s toughness and anger on the court fascinating to watch.

But Simmons does use metrics: the standard box scores. I would ask this: if Iverson didn’t score as much as he did, would Simmons still care? As Berri has noted, the rankings by sportswriters, the salaries given to scorers, and PER rankings all correlate highly with volume scoring (i.e. the points total, not field-goal percentage). Despite the tortured arguments writers might make, and the lip service given to building a lineup with complete players, “good” players are players who score a lot.

However, I should be clear and say that Simmons’s approach does not detract from his defense of his rankings. He uses player and coach testimonies, historical relevance, visual appeal of their playing style, sports writers, and the box scores to generate a living portrait of these players as people. Outside of the box scores, there are enough grist for the mill. I would suggest that it is these arguments that make the whole argument process fun. Even in baseball, supposedly the sport with the most statistically validated models of player performance (and Berri would argue that basketball players and their contribution to team records are even more consistent), there are enough differences of opinion concerning impact, playing styles, and relvance to confound Hall of Fame/MVP arguments (see Joe Posnanski).

Because Simmons is upfront about his criteria (even if the judgement of each might be not as “objective” as a number), it is fine for him to weight non-statistical arguments for greatness. It’s how he defined the game. Just as Berri defined “player productivity” in terms of his WP48 metric. Because Berri publishes in peer-reviewed journal, he needs methods that are reproducible. Science, and in general the peer review process, is a different process than writing books or Hall-of-Fame arguments or historical rankings. The implicit understanding of peer-review is that the work is technically sound and reproducible. Berri cannot take the chance of publishing a Simmons-like set of criteria and have other sports economist “turn the crank” and come out with different rankings. But Berri can publish an algorithm, and proper implementation will yield the same results.

Does this mean that Berri is right? Or that a formula is better than Simmons’s criteria? Mostly no. The one time where it is “better” is when one is preparing the analysis for peer-review. In this case, it is nicer to have a formula, or a process, or a set of instructions, that yield the same result each and everytime the experiment is run. In other words, we try to remove our bias as much as possible. Bias here does not mean anything pernicious; it just is a catch-all term for how we think a certain way (with our own gut feelings about the validity of ideas and research direction). Being objective simply means we try to make sure that our interpretation conforms to the data, and that the work is good enough so that other researchers come to the same general conclusions.

I think Simmons actually doesn’t need to trash statistics, nor does he need to ignore it. Once he establishes ground rules, he can emphasize or deemphasize how important box scores are in his evaluation. As it is, I found his arguments compelling. His strength, again, is to make basketball history an organic thing. He does his best to eliminate the “you had to be there” barrier and tries to place the players in the context of their time.

Now, one might ask why stats can’t be used to resolve these arguments about all time greats. Leaving aside the issue of the different eras (and frankly, this can be addressed by normalizing performance scores to the standard deviation for a given time period, as Berri does here ), there is the issue of what the differences in these metrics mean. In the same article I cited, Berri reports that the standard deviation for the performance of all power forwards, defined by his WP48 metric, is about .110. His average basketball player has a WP48 of .100. Kevin Garnett, for example, has a WP48 (2002-2003) of 0.443. That translates roughly that Garnett is more than 4x as productive as an average player, but normalized to the standard deviation, he is only 3.5x as productive.

But how much different is a power forward from Kevin Garnett if the other forward has a WP48 of 0.343? One might interpret this to mean that Garnett is still nearly 1 standard deviation better than the other player, but it could also mean that their performance fall within 1 standard deviation of each other. Depending on the variation of each player’s performance for a given year, compared to his career mean, they could be statistically similar. That is, the difference might be accounted for by the “noise” in slight upticks/downticks in rebounds/assists/steals/turnovers/shooting percentages/blocks. If you prefer, how about the difference between a .300 hitter and a .330 hitter? Over 500 at-bats, the .300 has 150 hits, and the .330 hitter has 165; the difference would be 15 hits over the course of a season. Are the two hitters really that different? The answer would depend on the variability of batting average (for the compared players) and how these numbers look with a larger sample set (i.e. over a career with over 5000 at-bats, for instance.) The context for the difference must be analyzed.

Here’s another example: let’s assume that Simmons and Berri’s metric turned out similar listings, perhaps with different order (one difference is that Iverson would be nowhere near Berri’s top 96.) And further, let us assume that the career WP48 scores are essentially within 1.5 standard deviations of one another. How might Simmons break with the WP48 rankings?

Let us tackle how Berri would have constructed his ranking: he would simply list players from highest to lowest WP48. That’s probably because he is in peer-review article mode. And frankly, if you profess to have a metric, why would you throw it out? You might if, like Simmons, you defined the argument differently. Of his Pyramid of Fame rankings, he lists a few arguments that do not encompass basketball productivity. Again, the idea of historical relevance, player/coach testimony, and the style and flair of the players enter into Simmons’s arguments. So all things being equal, and if the difference in rankings by metric is slight, there really is no reason against weighing the statistics more than any other attribute. Heck, even if the metric differences are large, it wouldn’t matter. Simmons like his other arguments more anyway.

But if you do talk about the actions on the court, then I believe you are in fact constrained. Of the metrics I had mentioned, WP48 offers high correlation with point-difference and thus with win-loss records. Further, some of the other metrics actually correlate with points-scored by players, suggesting that there is no difference between that metric and simply looking at the aggregate point total. So there are actually models that do reasonably well in predicting and “explaining” the mechanics of how teams win and lose.

In a way, I think the power of a proper metric is not in ranking similarly “productive” players, but in identifying the surprisingly bad or good players. Iverson is an example of the former; Josh Smith (of the 2009-2010 Hawks) of the latter. It might not be as powerful a separator of players with similar scores, because their means essentially fall within 1 standard deviation of one another; in essense, they are statistically the same. In this case, it  helps to have other information to aid evaluation (and this isn’t easy; as Malcolm Gladwell has written, and Steven Pinker taken issue with, some measuring sticks are less reliable than others.)

Another example where statistics is powerful is in determining, in the aggregate, if player performance varies from year to year. Berri found that it isn’t, suggesting that the impact of coaching and teammate changes may not be as high as one thinks. However, such a finding in no way precludes coaches and teammates from having an effect on teammates. It just means that these people are too few to affect the mean. Or perhaps it suggests that coachs are not using information properly to make adjustments that are meaningful to player performance. Overall, I suppose, one cause for why Simmons hates advanced stats and rankings is that he isn’t sensitive to the importance of standard deviation, and ironically enough,  he applies the mean tyrannically when there is such a concept as statistical insignificance.

But Berri has never pushed his work as a full explanation of the game of basketball. First, he doesn’t present in-game summaries: he only looks at averages over time. There’s nothing in his stat to indicate the ups and downs (i.e. standard deviation in performance) a player experiences from game to game. Even in baseball, hitting .333 does not guarantee a hit every 3 at-bats. It just means that over time, a hitter’s hit streaks and lulls add up to some number that is a third of his at-bats. Berri’s metric (and any other work that proposes to measure player performance) certainly cannot predict what a given box score would be, for a given game, for a given player.

Regardless, I do not see a problem with Simmons’s ranking his players. Simply, he values entertainment value as much as production. I would say he values the swings in performance just as much, if not more (more on this later). Yes, he says stats do not matter, but of course it does. It’s interesting that all the scoring lines he cites, in admiration, all lead with a high score or score per game. And if you can’t shoot, rebound, pass, steal, or block and coughs the ball up a lot, it wouldn’t matter how pretty you make everything look.

No-no’s

Joe Posnanski has pointed out that, whenever someone trashes stats, he tends to offer some other supplemental numbers that back up his point. In other words, the disagreement isn’t about statistics per se, but between the distinction of “obvious” stats vs. “convoluted” stats.

Even if one disagrees with basketball statistics, at least he can believe that statheads came up with a formula first and turned the crank before comparing the readout with their perceptions of players. Hence Simmons blowing up when PER or WP48 doesn’t rank his favorites highly.

Simmons approaches this from the opposite direction. He has an outcome in mind and “builds” a stat/model to fit it (like his 42-Club). But he mistakes his way of tinkering with what modelers actually do. Berri arrived at his model by performing linear regression on a particular box score and seeing whether the point-difference increased. It isn’t an arbitrary way of deriving some easy to use formulation. The regression coefficients are meaningful in that, what it says is, if you increase shooting percentage by this amount, the point-difference goes up by that amount. It so happens that points scored by a player did not increase the point-difference. And he built it by using all players; it’s strange to decide before hand what players are great, and then build a metric around that. Why even bother in the first place?

And for Berri to report differently on these aggregate data because Kobe isn’t ranked any higher, actually would become scientific fraud. But as I noted above, applying these WP48 rankings isn’t as hard and firm a process as Simmons thinks. There is some room for flexibility, depending on what one tries to accomplish.

In general, I agree that more break downs in the game would be useful, in the sense that more data is always nice. The problem, for academics, is that these stats might remain proprietary, and it becomes difficult to apply across all teams. Even if we could get all the “hyperintelligent” stat breakdowns from a single team, it is unclear if other teams would view the break down in the same way. The utility for examining general questions about worker (i.e. player) productivity for academic publication becomes less clear. The database ought to help the teams – assuming they are intellectually honest enough to verify that their stats that produce a better picture of player productivity and aren’t impressed by the gee-whiz-ness of it all. My guess is that they won’t be entirely successful, as Simmons still has a job trashing bad GM decisions.

Standard Deviations

Why I watch sports: it seems to be similar to the way Simmons does. He watches over a thousand hours of sports each year, waiting for the chance to see something he has never seen before. Something that stretches the imagination and the realm of human physical achievement.

I feel the same way; I am team and sport agnostic, and although I used to follow Boston Bruins hockey religiously, I left that behind in high school. Although I have lived in Boston from the age of 7 onwards, I had not been infected by the Red Sox or Celtics bug (even during their mid-80’s run). I did root for the Red Sox in 2003 and 2004, but that was because of the immense drama involved in the playoff games against the Yankees. And Bill Simmons’s blog for the season.

Perhaps I prove Simmons’s point about stat heads; I like to say that I am interested in sports in the abstract. I like the statistical analysis for the same reason Dave Berri had pointed out in his books. There is a wealth of data in there to be mined. I thought one good example of the type of research that can come from these data is finding evidence for racial bias in the way basketball referees call games.

However, what got me interested in watching professional sports was Simmons writing about it. Although I didn’t watch football, basketball, or baseball for a long time, I did watch the Olympics and, believe it or not, televised marathons. Partly it was because my wife and I were running, but mostly I saw the track and field type sports as a wonderful spectacle. So it wasn’t that much of a stretch to fall into a stereotypical male activity.

At any rate, I was amazed at Usain Bolt’s performance in the 2008 Summer Olympics. I was disappointed by Paula Radcliffe injuring herself during the Athens Olympics, and then relieved when she won the NYC marathon, setting a new speed record in the process. I rooted for Lance Armstrong to win his seventh Tour. I rooted for the Patriots to get their perfect season. And until the Colts laid down and the Saints loss a couple of weeks ago, I wanted the Colts and the Saints to meet in the Super Bowl, both sporting 18-0 records. I was glad that the Yankees won the World Series, and with that fantasy baseball lineup, I hope they continue to win. I want to see the best teams win, and win often. And yes, I wish the regular season records lined up with the championship winners for a given season. Then we wouldn’t have arguments about best regular season records and the championship winners.

This isn’t because I’m a bandwagon fan; I watch sports now for the same reason that Simmons does. To see the best of the best do great things. But not always because they might have a competitor who wants it more, leading to the best failing, at times. This drama is the power of sports.

And I can see why Simmons argues so passionately against stats. He likes the visceral impact of sports. I can say that Bolt ran a 9.69s 100 m. But it was nothing compared to seeing Bolt accelerate, distance himself from the other runners, and then slow down as he pulled into the finish line. He blew away the competition. My eyes were wide and my mouth hung open: he slowed down! And he was 2 strides ahead of everybody. And he set a new record. Even if Bolt didn’t set the record, he still made it look easy. On the field, on that particular day, he out-classed his competitors. It is watching the struggle of the competitors (like Phelps winning the 100m fly by 10 milliseconds), on that day, that matters. Over time, if one didn’t watch that particular heat, then the line World Record: Usain Bolt, 100 m, 9.69s doesn’t quite hit you the same way.

But then, there is this. What if instead of looking at the single race, you looked at the athlete performing in 8 or 20 or  50 events for a year? And at these events, the same set of athletes compete over and over?

Here are some possible outcomes: Phelps and Bolt lose every other match, essentially giving us a single transcendental moment. Phelps and Bolt win half their meets. Phelps and Bolt utterly dominate the field, winning 65% or more of their meets.

For first case, we would probably admit that the Phelps and Bolt phenomena was a one-off. For whatever reason, the contingencies (no sports gods or stars aligning here!) lined up such that they did highly improbable feats (but not impossible. This distinction is the point of this section.) The third case proves our point; they are not perfect, but they sure are good. The second case is a bit trickier: since they are right on the borderline, we need some analysis to help us decide. One way might be to sum up our individual observations about these two. Being .500, while giving us a single breathtaking moment might be persuasive. Or one might look at how everybody else did (Phelps and Bolt might have won 50% of the time, but if the remainder is split among their competitors, they have still dominated the field.)

But then what if Bolt and Phelps won 49% of the time, and some other competitor won 50% of the time? What then? Here, criteria are important. Most of the time, we say better meaning, well, something is better. Generally, we aren’t specific about what we mean by it.

In the book, Simmons ranks his top 96 players in a pyramid schematic. He is rather specific about what he wants in a player. And as one expects, he is specific about the types of intangibles his basketball player should have (basically, basketball sense – i.e. The Secret, if he made his teammates better, winnability, and if you choose someone based on “if your life depended on this one guy winning you a title.”) The evaluation of those intangibles, however, is not as precise as he’d like. However, the advantage here is that one might be able to answer “why” questions. In some cases, Simmons seemingly ranked two players differently while giving them the same arguments (like the consistency of Tim Duncan and John Stockton. Somehow, Stockton just rubbed Simmons the wrong way, while Duncan’s consistency makes him the seventh best player of all time.) And his emphasis on projecting Bill Russell’s game into the modern era seemed like Russell should have ranked lower. On occasion, I was left with the feeling that the arguments did not match the ranking.  From what he said about the stat inflation and how Wilt didn’t get the secret, I thought he would be ranked lower than 6.

Dave Berri has the opposite problem: he has a mathematically defined metric and when he says better or worse, it’s whether this metric is higher or lower between the players being compared. He can further break down this stat to show where a player is good or deficient (whether shooting percentage, blocks, turnovers, fouls, steals,  and assists are above or below the average). He can tell you the hows, with his model spitting out a number that combines these different performance stat into a metric of productivity. But he simply ranks players numerically, without talking about how these differences one might see between the players (and one might not be able to see it… it could be one more missed shot or one less rebound every couple of games.)

I am amazed that Simmons cannot reconcile eyeball and statistical information. Just about every time Simmons bitches out scorers, he talks about how this player didn’t get “The Secret”. It isn’t about scoring; it’s about having a complete game. It is about making the team better with the skills you have. To top it off, Simmons then says that point getters are one dimensional. You can’t shy away from rebounds. It’s great to have a few steals/blocks. Sure, not every athlete can do it all, and certainly not be as prolific as superstars, but you can’t avoid doing those things.

I’m sure Berri is nodding his head, agreeing with Simmons. Point getting isn’t the same as being a efficient shooter (at least average field goal and free throw percentages). And you certainly can’t be below average in the other areas if you want to help your team.

But Berri generally writes about the average. Simmons focuses on the standard deviations. He doesn’t just care about the scoring line; he focuses on Achilles-wreaking-havoc-on-the-Trojans type of performances. He loves the stories of Jordan’s pathological competitiveness. In other words, Simmons lives for the outlier moments.

And I think therein lies the nutshell (and to borrow a Simmons device, I could have said this 5500 words ago and shortened this review.) Simmons views the out-of-normal performance as transcendent, as examples of players who wanted something more or had something to prove. He treats the extreme as something significant; he uses a back story to it to give the event meaning. That’s fine. It’s also fine when Berri (and stat heads) are constrained in treating outliers as noise (possibly) or irrelevant to the general scope of the model, if they desire a model of what usually happens and are not concerned with doing the job of a GM and a coach for free. Because they both defined the game they wish to play in.

%d bloggers like this: