Well, I’m not the only saying this (sadly, it’s behind a paywall):
The melding of science and statistics has often propelled major breakthroughs. Last year’s Nobel Prize in Physics was awarded for the discovery of the accelerating expansion of the universe. That discovery was facilitated by sophisticated statistical methods, establishing that the finding was not an artifact of imprecise measurement or miscalculations. Statistical methods also allowed the trial demonstrating that zidovudine reduces the risk of HIV transmission from infected pregnant women to their infants to be stopped early, benefiting countless children. Statistical principles have been the foundation for field trials that have improved agricultural quality and for the randomized clinical trial, the gold standard for comparing treatments and the backbone of the drug regulatory system.
I spent a little bit of time trying to present ways that scientists and laymen can engage each other. It seems that in calling for a policy change, either in raising the level of public funding or peddling statistics as a viable career choice, perhaps Science should have made these articles freely available? Otherwise, Marie Davidian and Thomas Louis, the authors of this editorial, are preaching to the choir.
****
This is as good a time as any to present my thoughts on Stephen Baker’s The Numerati. It is a serviceable introduction to the arenas where statistical analyses of large data sets are gaining prominence. Despite the title, the book does not really present leading scientists and statisticians who are at the forefront of converting our analog lives into computer friendly numbers. I would also have liked to see this book grapple more with issues such as how non-statisticians should come to terms with how we are all being quantified and analyzed.
The book presents this numerification without judgment. It is simply a description of what is already happening. By virtue of Mr. Baker’s matter-of-fact presentation, we can surmise that current uses of behavior quantification seem to be used to market products to us or to track on us. Politicians get to slice us into finer demographics; true believers are ignored while swing voters are targeted. Stores entice consumers to join rewards programs; the information that businesses gain is cheaply bought. The debris of our personal lives are vacuumed by governments, intent on identifying the terrorists among us. The workplace becomes more divided, first by cubicle walls and then by algorithms designed to flag malingerers.
Mr. Baker does not dwell on how power resides in those who have access to the information, although most of the researchers seem to think that their analyses will be used by laymen as much as by themselves. He presents two dissenting voices; one is a detective who utilizes the latest face recognition software for casinos. The expert has become an advocate for the privacy that citizens deserve; it might be uncomfortable for one to receive targeted ads that presumes too much about our behavior. The other is Baker himself, but only in the narrow scope of how numerification affects his own industry. He thinks there is a value in the role of editors in acting as a curator for news. Otherwise, that role will fall to the reader, who may be overwhelmed by the number of news items. More likely, that reader will defer to search engines (the very things supplanting editors).
Mr. Baker does not really push this issue, but search engines do not have to be value-neutral. They can very well reflect the political biases of their owners, or the function itself might be a value-add meant to drive up revenue streams (don’t forget, Google makes money by selling ads.) People tend to think of software as without bias and objective, due to its being based on algorithms, machine rules and mathematical models. I think one interesting aspect of numerification in that it in no way dismisses the need for judgment. This is especially important in selecting the mathematical rules to use, the filters and gates one applies to data, and the interpretation of results. A computer can crank out numbers, but humans decide what formulas to use.
A short while ago, I was discussing this very issue with a director of analytics at a marketing firm. We got to discussing cluster analysis; we both felt that while its result is perfect for what we want to do with our data, there is a surprising amount of ambiguity involved. In MatLab, one function used for finding groups of data points is k-means clustering. To use it, you have to specify how many clusters the function should slice the data. The process itself is straightforward: a number of positions are selected at random. The algorithm then proceeds to reposition these points so that it is equidistant from the group of points that will form the cluster. Everything about it works as advertised, expect for the part where the user needs to tell the program how many clusters there are. Not much help if you are looking for a computational method to find the clusters “objectively.” The director and I moved onto other topics, such as formulating the machine rules and vetting them.
Let’s leave aside the loss of dignity and individuality entailed in numerification; the subtle points not addressed in The Numerati are how models are built and how metrics are validated. This touches directly on the things that can go wrong with numerifying society. The most obvious example is bad data – either typos or out of date information – leading to misclassification. It’s not identify theft, but the result is the same: some agent attributes some notoriety to the wrong person. The victim gets stuck with a bill or worse, labeled a terrorist and detained by authorities. Another possible error is that the wrong metric is used, leading to even more inefficiencies than had numbers been ignored. Simply, are the measures used really the most relevant ones, and how likely are we to settle on the wrong formula?
Dave Berri, a sports statistician, has been a bellweather in this regard. He has spent significant space in two books, The Wages of Win and Stumbling on Wins, as well as on his website and on the Huffington Post, documenting how even people with a vested interest in using statistics do not always come to scientifically consistent conclusions. He is able to use sports statistics to give us insight into the decision making process. His observations and models, and frankly most models in general, have been met with two criticisms: 1) math models do not capture something as complicated as basketball and 2) his findings have deviated from existing opinion – that is, his models seem wrong. Answering these questions get at some issues at data-mining and correlation analysis that The Numerati avoided.
***
Both the objections speak of the confusion people have between determinism and the predictions one can make with a model. First, there are actually few deterministic physical laws. Quantum mechanics happen to be one, but the effects can only be seen in reduced systems – the level of single electrons. As we include more of the universe, at the scales relevant to human experience, our deterministic laws take on a more approximate character. We being to model empirical effects and not so much deriving solutions based on first principles (with a few important caveats.) The point is that we can use Newton’s Laws just fine in sending our space probes to Jupiter, with the laws modeled after observation. We do not need to use a unified field theory to figure out how the subatomic particles of the molecules of a spacecraft interact with the like particles making up Jupiter to help us aim.
Models based on empirical findings can only predict events prescribed within the boundaries of observation. This is even more true of statistical models based on data mining. New conditions can arise such that they invalidate the assumptions (or the previous observations) used to build the model. The worst case scenario is when some infrequent catastrophe occurs – Nicholas Nassim Taleb’s “black swan” event.
That’s part of the art of working with models. We must understand their limitations as well as their conclusions. As the system becomes more complex, so do our models (generally). The complexity of our models may be linked to both the system and to the precision which we require. For example, one can model Texas Hold’em in terms of the probability of receiving a given hand and deriving optimal betting strategies. But that ignores the game theory aspect of the game: players can use information gained during the course of play, bluffing, and alterations in strategy by plain ignorance. There are also emotional aspects to play that might lead players to deviate from optimal strategy or miscalculate probabilities. For models that are based on observations, their predictions pertain to the likelihood of outcomes. Over many trials, I would expect the frequency of outcomes to conform to the model, but I cannot predict what the immediate next result will be. It’s the same as knowing that throwing 7’s is the most common event when playing craps, but I can’t say whether the next throw will in fact be a 7.
So why build these models? Because the process allows us to make explicit our ideas. It allows us to specify things we know, things we wish we knew, and possibly to help us identify thing we were ignorant of. Let us use sports as an example. Regardless of what we think about statistics and models, all of us already have one running in our heads. In the case of basketball, we can actually see this unspoken bias: general managers, sportswriters, and fans tend to name players as above average the more points per game they score. This is without consideration of other contributions, like steals, blocks, turnovers, fouls, rebounds, and shooting percentage. We know this because of empirical data: the pay scale of basketball players (controlled by GMs), MVP voting (by sportswriters) and All-Star selections (by coaches and the fans). The number of points scored best explains why someone is chosen as a top player.
The upshot is that humans have a nervous system built to extract patterns. This is great for creating heuristics – general rules of thumbs. Unfortunately, we are influenced not only by the actual frequency of events but also by our emotions. Thus we do not actually have an objective model, but one modified by our perceptions. In other words, unless we take steps to count properly – that is, to create a mathematically precise model – we risk giving our subjective biases the veneer of objectivity. This is a worse situation than having no model; we would place our confidence in something that will systematically give us wrong answers, rather than realizing we simply don’t know.
There are even more subtle problems with model building. Even having quantifiable events and objective observational data do not guarantee that one will have a good model. This problem can be seen in the NFL draft; the predictors that coaches use – this time published and made explicit, such as Wonderlic scores and NFL combine observations – do not have much value in identifying players who will be average, let alone be superstars. Berri has presented a lot of data on this, ranging from original research published in economic journals to more informal channels such as his books and web pieces. So how do we conclude that we have a good model?
***
Here is where it gets tricky. In the case of sports, we can identify good output metric, such as a team’s win-loss record. If you start from scratch, you might argue that a winning team must score more points than an opponent. You would test this by performing a simple linear regression analysis, and you would find that it is in fact the case. As a matter of fact, the first model is an obvious one: score more points than your opponents and you win. So obvious that is sounds like it is the definition of a win. In this case, it becomes apparent that the win-loss record is a “symptom”, a reflection of the fact that for a given game, players do not make wins, but they do make points. Points-scored and points-against (point differential) become a more elemental assumption.
This isn’t too novel a finding; most sports conform to some variant of Bill James’s Pythagorean expectation (named as such because its terms resemble the Pythagorean relation a^2 + b^2 = c^2.) If we start at the assumption that everything a player does to help or hurt his team is to score points, then we can begin to ask whether all points are equal and whether other factors help or prevent teams from scoring. As it happens, Berri has done a credible job of building a simple formula using basketball box scores (rebounds, personal fouls, shooting percentage, assists, turnovers, blocks, and steals.) Here, we have obvious end goal measures: point differential and ultimately, win-loss.
But what if there are no obvious standard to judge the effectiveness of our models? That is the situation encountered by modelers who try to identify terrorists or to increase worker productivity. Frankly, the outcomes are confounded by the fact that terrorists take steps to hide their guilt, and workers might work much harder at giving the appearance of productivity than to actually do work. In this case, deciding which parameters are significant predictors is only half the job; one might need to perform an empirical investigation in order to establish the outcome. The irony is that despite the complicated circumstances in a sports contest, the system remains well-specified and amenable to analysis. Life, then, is characterized by having more parameters and variables, being less defined in outcome, and with much greater noise associated with their measures.
Nevertheless, some analysis can be done. Careful observation will allow us to classify the most frequent outcomes. This is most clear in the recommendations from Amazon: “Customers who purchased this also bought that.” If that linkage passes some threshold, it is to Amazon’s benefit to suggest it the customer. Thus the parallels between basketball (and sports) statistics and the numerification of life are clear. The key is to find a standard for performance. For a retailer, it might be sales. For a biotech company, it could be the number of candidate drugs entering Phase I clinical trials. Some endpoints might be fuzzier (what would one say makes a productive office worker? The ratio of inbox to outbox?) Again, identifying a proper standard is hard, combining both art and science. This is another point ignored in Baker’s book: there are many points for humans to exert an influence in modeling.
Basketball can again serve as an illustration. The action is dynamic, fast-paced, and has many independent elements (that is, the 10 players on the court.) However, just because we perceive a system to be complex does not imply that the model itself needs to be. Bill Simmons, a vocal opponent of statistics in “complicated” games like basketball, makes a big deal about “smart” statistics – like breaking down game footage into more precise descriptions of action, such as whether a shooter favors driving to one side over the other, if he has a higher shooting percentage from the corners, how far he plays off his man, and so on. In other words, Simmons would say that there is a lot of information ignored by box scores. Ergo, they cannot possibly be of use to basketball personnel. As Berri and colleagues have shown, box scores do provide a fair amount of predictive value – with regard to points differential.
What critics like Simmons miss is that these models most definitely describes the past, or, what the players have done, but the future is quite a bit more open ended. These critics confuse “could” with “will.” A model’s predictive value depends on not only its correlation with the standard and how stable it is across time. Again, despite the rather complicated action on the court, basketball players performance, modeled using WP48, is fairly consistent from year to year. Armed with this information, one might reasonably propose that LeBron James, having this level of performance last year, might reach a similar level this year.
As any modeler realizes, that simple linear extrapolation ignores many other variables. One simple confound is injury. Another is age. Yet another is whether the coach actually uses the player. In other words, the critics tend to assume past performance equals future returns. The statistical model, even WP48, does not allow us to say, with deterministic accuracy, how a player will perform from game to game, let alone from year to yea. At the same time, the model does not present a “cap” on a player’s potential. Used judiciously, it is a starting point for allowing coaches and GMs to identify the strengths and weaknesses of their players, freeing them to devise drills and game strategies that can improve player performance. Interpreted in this way, WP48 allows coaches to see whether their strategies have an impact on overall player productivity, which should lead to more points scored and fewer points given up.
How would we deal with competing models? The standard of choice, in sports – the points differential – also allows us to compare Berri’s formula with other advanced statistics. Berri’s “Wins Produced Per 48 minutes” (WP48) stat correlates with point differential, and hence wins. Among many competing models, John Hollinger has presented a popular alternative, the Player Efficiency Rating (PER). PER is a proprietary algorithm and by all accounts, “complicated”. That’s fine, except Berri showed that the rankings generated by PER differs little from ranking players according to their average points scored per game. In other words, you can get the same performance as PER by simply listing a player’s Points-per-game stat. Interestingly, Points-per-game has lower correlation to the points-differential than WP48: by the measure with the standard, simply scoring points actually does not lead to wins. On an intuitive level, this makes sense, because you also need to play defense and keep the opponents from scoring more than you.
A shrewd reader might also have realized that there can be “equivalent” models. This was emphasized by showing that two metrics are highly correlated to each other (such as points-per-game and PER). Coupled with correlation to our standard, we know have a technique for comparing models both on how well they perform and if we have redundant formulas. This is useful; if we have two alternatives that tell us the exact same thing, then wouldn’t we rather use the simpler one?
Recently, an undergraduate student undertook a project to model PER, resulting in a linear equation that allowed for analysis of the weightings that John Hollinger most likely used. In turn, this lays bare the assumptions and biases that Hollinger used in constructing his model. An analysis of the simplified PER models suggest that PER is dominated by points scored. All the other variables in PER only give the pretense of adding information. There are underlying assumptions and factors that prove overwhelming in their effects. But this isn’t such a novel finding given the suspiciously high correlation with points-per-game (and lower correlation with point-differential.) In this sense, then, “good” only implies correlation with the standards the modelers used. It isn’t “good” in the sense of being compared against what we feel a good model should look like.
***
I’ve been writing essays trying help non-scientists deal with scientific findings. When reporters filter research, much information gets trimmed. Emphasis is usually given to conclusions; the problem is that good science is a direct function of the methods. Garbage in, garbage out still holds, but bad methods will turn gold into garbage as well.
The paper I will next discuss highlights this issue: correlation and causation are two different beasts, and mistaking the two can take a very subtle form. Venet and colleagues recently published an article* in PLOS Computational Biology showing how, even when care is taken to identify the underlying mechanism of disease, the very mechanism of disease pathology may not prove to be specific enough of a metric to help clinicians diagnose the disease. They write,
Hundreds of studies in oncology have suggested the biological relevance to human of putative cancer-driving mechanisms with the following three steps: 1) characterize the mechanism in a model system, 2) derive from the model system a marker whose expression changes when the mechanism is altered, and 3) show that marker expression correlates with disease outcome in patients—the last figure of such paper is typically a Kaplan-Meier plot illustrating this correlation.
This is essentially the same method other mathematicians and modelers will use to identify target markets, demographics, terrorists, athletic performance, and what have you. In this case, one would assume that the wealth of research in breast cancer will yield many “hard” metrics by which one can identify a patient with the disease. Venet and colleagues show that this is not the case; the problem is,
… meta-analyses of several outcome signatures have shown that they have essentially equivalent prognostic performances [35], [36], and are highly correlated with proliferation [7]–[8], [37], a predictor of breast cancer outcome that has been used for decades [38]–[40].
This raises a question: are all these mechanisms major independent drivers of breast cancer progression, or is step #3 inconclusive because of a basic confounding variable problem? To take an example of complex system outside oncology, let us suppose we are trying to discover which socio-economical variables drive people’s health. We may find that the number of TV sets per household is positively correlated with longer life expectancy. This, of course, does not imply that TV sets improve health. Life expectancy and TV sets per household are both correlated with the gross national product per capita of nations, as are many other causes or byproducts of wealth such as energy consumption or education. So, is the significant association of say, a stem cell signature, with human breast cancer outcome informative about the relevance of stem cells to human breast cancer?
Scientific research is powerful because of its compare-contrast approach – explicit comparisons of test case with a control case. We can take a sick animal or patient, identify the diseased cells, and do research on it. All the research generally revolves around taking two identical types of cells (or animals, or conditions), but with one crucial difference. For the case of cancer, one might reasonably select a cancer cell compare it to a normal cell of the same type. In this way, we can ask how the two differ.
If the controls were not well-designed, then one might really be testing for correlation, not causation. As one can imagine, even if a few things go wrong, the effects might be masked by many disease-irrelevant processes – this is what we would call noise. Venet and colleagues looked at studies that used gene expression profiles. The idea is that a diseased cell will have some different phenotype (i.e. “characteristic”), whether it be in the genes it expresses, or the proteins that it uses, or in its responses to signals from other cells, or in its continual growth, or in its ignoring the cell-death signal, and so on. One characteristic of cancerous cells is that it grows and divides. The signature that researchers had focused on was simply the genes expressed by cancer cells, which presumably will not be expressed in non-cancer cells. Remember this point; it becomes important later.
Further, it was reasonable to hypothesize that the power of this test would grow when more and more genes from the diseased state are incorporated in the diagnostic. Whatever differed between cancer and normal cells should, in theory, be used as either a diagnostic marker or a potential target for drug action. As Venet and colleagues point out, many genes actually play a role in the grow-and-divide cycle (“proliferation”) of normal cells. While these genes may have increased expression in cancer cells their elevated levels will key them as being different from normal cells. In this case, that isn’t enough; the underlying attribute of these genes reflect an aberrant state, but only by degree. Even normal cells proliferate; it so happens that the genes involved in this process are relatively numerous. Thus there are two problems: one is that the markers are no good because they do not provide enough uniqueness or separation from the normal state. Second, a related problem is that if one were to pick a number of cells at random to use as a diagnostic (in this case for breast cancer), one will end up with a gene related to proliferation, since these genes are enriched. Even a random metric will show correlation to breast cancer diagnosis since chances are, a gene related to proliferation will be chosen. The problem is that the metric assumed that cancer cells has a gene expression profile that consists of genes expressed only in cancer cells (an on-off versus a more-less distinction.)
In the words of Venet and colleagues,
Few studies using the outcome-association argument present negative controls to check whether their signature of interest is indeed more strongly related to outcome than signatures with no underlying oncological rationale. In statistical terms, these studies typically rest on [the null hypothesis] assuming a background of no association with outcome. The negative controls we present here prove this assumption wrong: a random signature is more likely to be correlated with breast cancer outcome than not. The statistical explanation for this phenomenon lies in the correlation of a large fraction of the breast transcriptome with one variable, we call it meta-PCNA, which integrates most of the prognostic information available in current breast cancer gene expression data. (emphasis mine)
The method was simple; Venet and colleagues compared previously published gene expression profiles vetted for breast cancer diagnosis and gene-signatures from other biological processes (such as “social defeat in mice” and “localization of skin fibroblasts”) and also from a random selection of genes from the human genome. All these metrics, regardless of relation to oncological significance, showed “predictive” value for breast cancer. What that means is that if your cells express these genes, you will be diagnosed with breast cancer. Hence the title of the paper, “Most Random Gene Expression Signatures Are Significantly Associated with Breast Cancer Outcome.”
***
How do we deal with this study? Does it suggest that biomarkers are a waste? No. For one, the only test presented in this paper is one where a randomized signature is compared to a breast-cancer diagnostic based on gene expression. That a specific test does no better than chance only allows us to conclude the test is deficient in some way. The point is that the existing test may be keying in on “proliferation”, except that Venet and colleagues showed that removing such genes did not worsen the performance of the randomized gene set in “diagnosing” breast cancer. It may be that the gene expression data has not been sufficiently de-noised. One can certainly try to “clean” up the model, but new tests must be shown to differ from the baseline (or, control) level of performance of a randomized gene set.
And how does this relate to the earlier points about basketball statistics? Only in that modeling effectiveness depends on how good a standard is, how well the variables are characterized, and how independent the relationships among the variables really are. Having testable hypotheses and experiments help too (although it seems a shame that gene expression profiles may not prove to be the key factor in this specific scenario). Even leaving aside the question of whether a model is good or bad, being able to show statistical correlation between models is powerful. Before, I had written that Dave Berri showed that John Hollinger’s PER model has no significant difference from simply looking at points-per-game (in fact, the correspondence is nearly one to one.) This conclusion was revealed by the types of statistical analyses that allowed Venet and colleagues to show the equivalence between existing “breast cancer” gene signatures and a randomized one. While correlation does not imply causation, in the case of models, they can certainly help us identify equivalent models with redundant information.
Why writers should make use of all tools at their disposal
Joe Posnanski has written another thoughtful piece on the divide between writers of a statistical bent and those who prefer the evidence of their eyes. I highly recommend it; Posnanski distills the arguments into one about stories. Do statistics ruin them? His answer is no. Obviously, one should use statistics to tell other stories, if not necessarily better ones. He approached this by examining how one statistic, “Win Probability Added”, helped him look at certain games with fresh eyes.
My only comment here is that, I’ve noticed on his and other sites (such as Dave Berri’s Wages of Wins Journal) that one difficulty in getting non-statisticians to look at numbers is that they tend to desire certainty. What they usually get from statisticians, economists, and scientists are reams of ambiguity. The problem comes not when someone is able to label Michael Jordan as the greatest player of all time*; the problem comes when one is left trying to place merely great players against each other.
* Interestingly enough, it turns out the post I linked to was one where Prof. Dave Berri was defending himself against a misperception. It seems writers such as Matthew Yglesias and King Kaufman had mistook Prof. Berri’s argument using his Wins Produced and WP48 statistics, thinking that Prof. Berri wrote other players were “more productive” than Jordan. To which Prof. Berri replied, “Did not”, but also gave some nuanced approaches in how one might look at statistics. In summary, Prof. Berri focused on the difference in performance of Jordan above that of his contemporary peers.
The article I linked to about Michael Jordan shows that, when one compares numbers directly, care should be taken to place them into context. For example, Prof. Berri writes that, in the book Wages of Wins, he devoted a chapter to “The Jordan Legend.” at one point, though, he writes that
in 1995-96 … Jordan produced nearly 25 wins. This lofty total was eclipsed by David Robinson, a center for the San Antonio Spurs who produced 28 victories.
When we examine how many standard deviations each player is above the average at his position, we have evidence that Jordan had the better season. Robinson’s WP48 of 0.449 was 2.6 standard deviations above the average center. Jordan posted a WP48 of 0.386, but given that shooting guards have a relatively small variation in performance, MJ was actually 3.2 standard deviations better than the average player at his position. When we take into account the realities of NBA production, Jordan’s performance at guard is all the more incredible.
If one simply looked at the numbers, it does seem like a conclusive argument that Robinson, having produced more “wins” than Jordan, should be the better player. The nuance comes when Prof. Berri places that into context. Centers, working closer to the basket, ought to have more, high-percentage shooting opportunities, rebounds, and blocks. His metric of choice, WP48, takes these into consideration. When one then looks at how well Robinson performed above his proper comparison group (i.e. other centers), we see that Robinson’s exceptional performance is something one should expect when comparing against other positions but is not beyond the pale when compared to other centers. However, Jordan’s performance, when compared to other guards, shows him to be in a league of his own.
That argument was accomplished by taking absolute numbers (generated for all NBA players, for all positions) and placing them into context (comparing to a specific set of averages, such as by position.)
This is where logic, math, and intuition can get you. I don’t think most people would have trouble understanding how Prof. Berri constructed his arguments. He tells you where his numbers came from, why there might be issues and going against “conventional wisdom”, and in this case, the way he structured his analysis resolved this difference (it isn’t always the case he’ll confirm conventional wisdom – see his discussions on Kobe Bryant.)
However, I would like to focus on the fact that Prof. Berri’s difficulties came when his statistics generated larger numbers for players not named Michael Jordan. (I will refer people to a recent post listing a top-50 of NBA players on Wages of Win Journal.*)
* May increase blood pressure.
In most people’s minds, that clearly leads to a contradiction: how can this guy, with smaller numbers, be better than the other guy? Another way of putting this is: differences in numbers always matter, and they matter in the way “intuition” tells us.
In this context, it is understandable why people give such significance to 0.300 over 0.298. One is larger than the other, and it’s a round number to boot. Over 500 at-bats, the difference between a 300-hitter and a .298-hitter translates to 1 hit. For most people who work with numbers, such a difference is non-existent. However, if one were to perform “rare-event” screening, such as for cells in the blood stream that were marked with a probe that “lights” up for cancer cells, then a difference of 1 or 2 might matter. In this case, the context is that, over a million cells, one might expect to see, by chance, 5 or so false-positives in a person without cancer. However, in a person with cancer, that number may jump to 8 or 10.
For another example: try Bill Simmons’s ranking of the top 100 basketball players in his book, The Book of Basketball. Frankly, a lot of the descriptions, justifications, arguments, and yes, statistics that Simmons cites looks similar. However, my point here is that, in his mind, Simmons’s ranking scheme matters. The 11th best player of all time lost something by not being in the top-10, but you are still better off than the 12th best player. Again, as someone who works with numbers, I think it might make a bit more sense to just class players into cohorts. The interpretation here is that, at some level, any group of 5 (or even 10) players ranked near one another are practically interchangeable in terms of their practicing their craft. The differences between two teams of such players is only good for people forced to make predictions, like sportswriters and bettors. With that said, if one is playing GM, it is absolutely a valid criterion to put a team of these best players together based on some aesthetic consideration. It’s just as valid to simply go down a list and pick the top-5 players as ordered by some statistic.* If two people pick their teams in a similar fashion, then it is likely a crap shoot as to which will be the better team in any one-off series. Over time (like an 82-game season), such differences may become magnified. Even then, the win difference between the two team may be 2 or 3.
* Although some statistics are better at accounting for variance than others.
How this leads back to Posnanski is as follows. In a lot of cases, he does not just simply rank numbers; partly, he’s a writer and story teller. The numbers are not the point; the numbers illustrate. Visually, there isn’t always a glaring difference between them, especially when one looks at the top performances.
Most often, the tie-breaker comes down to the story, or, rather, what Posnanski wishes to demonstrate. He’ll find other reasons to value them. In the Posnanski post I mentioned, I don’t think the piece would make a good story, even if it highlighted his argument well, had it ended differently.
Franzen and Picoult, yet again
Back when Oprah Winfrey selected Jonathan Franzen’s The Corrections, I saw a distinct lack of graciousness from various authors and book critics. As I remembered it, the reaction was almost a dismay and outrage that she would drag a piece of literary fiction through the mud that constitutes the low-brow mainstream. There also seemed to be an undercurrent of snobbery as applied to Winfrey. She had chosen mainstream potboilers and melodramas; selecting Franzen had the appearance of Winfrey ‘trying’ to seem smart or high-brow.
As if a woman who built a billion dollar media company from nothing lacks the intelligence or emotional acumen to understand literary fiction. As if she needed to justify why she veered from choosing another mass-market novel about a broken romance or an issue. As if her business sense couldn’t translate into her appreciating Literature. As if she needed the pretension of reading Literature to convince anyone that she has a rich, considered inner life.
Franzen, I am sure, will take his new opportunity to address why the flap over the corrections. He had even made some statement about it already, blaming his lack of experience in dealing with the exposure. Sure. Whatever. I do give him some credit; I distinctly remember a lot of other people slapping down Oprah, but nothing so bad coming from him.
I continue to detect this vein of elitism coming various poison pens, today. This time, at least the arguments are carried by authors.
I will be clear here; I have not ever read a work of so called Literary fiction that was difficult in an intellectual sense. No words stump me; no metaphor goes unnoticed or misunderstood; no linguistic fireworks ever go unappreciated. I appreciate the talent, skill, and craft going into beautifully constructed novels. I understand the themes and issues that are the reasons for an author to write. I love complex characters who straddle the gray of living in the world. I like denouement and dramatic closure, which I do not confuse with a tidy, happy ending where all problems are resolved (see Peter F. Hamilton’s The Evolutionary Void for this. This is a three volume space opera and contains a novel within a novel. There’s a lot going on. The series boils down to a happy ending, for everybody, in the last 2 or 3 pages. This struck a wrong note with me. But it’s still a fantastic read.) I also understand that writing fiction is not my forte.
A novel is never the intellectually difficult exercise that science is, for the reader. Literature isn’t rocket science. It isn’t even a social science. This is not a criticism so much as an observation. The novel embraces life in its messy, tangled glory. The scientist strives to tease out the role specific parts play in creating that mess.
Both are difficult, but in different ways. Literature is difficult as an act of creation; science is difficult in its comprehension. In Literature, all asides, digressions, and verbosity, when done well, contribute to the greatness of the work. In a way, writers make the text hard, but in an aesthetically pleasing way. In science, the descriptions and discussion are stripped bare, because the ideas, assumptions, and experiments are already convoluted. Each assumption is based upon a foundation of many other ideas, all linked to the strength of experiments addressing them. In many cases, the experiment at hand is to address some inadequacy and nuance in a previous paper that may open up new lines of inquiry. To make things any harder to understand is to waste a scientist’s time. Either way, badly written novels and scientific papers will accomplish the same thing: thrown at a wall in disgust and then ignored.
And the phrase ‘novel of ideas’ annoys me. Apparently these Literary authors – and the critics who set themselves up as professional connessiuer of Literature – have done a great job creating a sandbox from which genres are excluded. So we get stilted prose and writing about white, male assholes who behave badly, observe the shit leading to his situation, and then internalize all such snarky observations to himself while never making a mental connection with his (usually sexy) significant other. And so the true novel of ideas, found in science fiction, is ignored.
I am sure that I just conjured visions of space ships, phasers, droids, and Death Stars. The sci-fi I refer to is that branch known as hard science novels – for example Stephen Baxter. This type of novel are fantastic extrapolations of current state of the art science. Admittedly, one-dimensional sci-fi read like either a Star Trek episode or a technical manual, but the best sci-fi actually examines the human condition in the context of new technological and social environment. It is an extension of the basic premise of what white, male literary authors write about. Instead of some recognizable human event, some sci-fi authors are interested in placing recognizable, human characters in unfamiliar confines (I think P.D. James’s The Children of Men is a good example of this). And yes, a Baxter novel, a William Gibson novel, a Charlie Stross novel, a Margaret Atwood novel and especially a Neil Stephenson novel provide more raw ideas than most literary novels hope to capture.
Even during my essay on Medium Raw, I was really thinking of this divide between what the so-called professional critics and “serious” chefs and what appeals to the public. I do find Literary critics and authors (and ultra serious chefs and food-writers) to be pretentious, as if what they do is so hard to understand (I recognize that it is hard to write a novel and to create new dishes. But to understand a novel or to enjoy food? No.) Theirs is elitism without merit. While talented, the degree to which their talent engenders appeal depends on the fancies of the buying public. This is true because everybody is selling to the public now, not a few pricey artisanal items to the extremely wealthy. The fact that some authors (or pop stars, or movies) get all the sales (or ratings) do not mean that non-blockbuster authors do no good. Of course they do. Unfortunately, most people focus on the big winners (like a Stephen King, or a James Patterson), but there ought to be enough good writers occupying the midlist and who are deserving of some critical analysis or exposure.
I think this is a point that Jennifer Weiner and Jodi Picoult were trying to make, in the Huffington Post interview . It seems ludicrous to assume that if an author makes money, he can’t possibly be good. By the same token, just because a writer continues to starve does not give him any status; sure, he loves writing and sacrifices for his art. But perhaps he continues to suffer because he is not all that good. As Koa Lani pointed out in her rebuttal, even if every author profiled fit the “white, male, from Brooklyn stereotype” that Weiner and Picoult satirized, it may be that profiled and acclaimed authors deserve the adulation. I do not see the two points as contradictory: 1) that mainstream literature probably won’t field as many impact novels and writers but they are there and 2) that generally, writers who get profiles deserve it, even if others who deserve the press do not get it.
What I find strange is that everyone accepts that there are so few good writers worthy of a professional connoisseur. Here’s the problem: I’m never sure whether the critics like a book sincerely or if it is a pose. When I was reading Bourdain’s Medium Raw, he made similar points about food critics. It seems strange to him that critics have a death watch culture, where, once a chef is proclaimed to be the best cook ever, everyone is now scrutinizing his every move, pouncing on the point when he began his slide. It really is just snobbery, rather than any sincere appreciation of the food, that drives these people. Just as these food critics wish to glow in the luster of their “discovery”, so too must they exact a tax on the fall from the summit of said chef. There are such enthusiasts and critics in every modality (movies, TV – from which the phrase “jump the shark” was derived, music – please see Nick Hornby’s Juliet, Naked, and books), and because they do not create, they nominate themselves as arbiters over those who do. As if an opinion of a book is somehow as important as the book itself or even a discussion of ideas contained within (the first point is discussed in Mark Helprin’s Digital Barbarism.) These poseurs wish to be the first to trumpet talent and the first to sound the end.
It wouldn’t astound me if critics are affected by what their peers think (no one wants to miss a Franzen or Lethem, and no one wants to coronate Nicholas Sparks, I presume.) Just as likely, perhaps critics just simply want to be contrarian (see the Roger Ebert vs. Armond White).
This isn’t necessary a bad thing, but it could help explain why the stereotype “white, male, from Brooklyn, and who teaches creative writing” is so well represented in Literary reviews. In Leonard Mlodinow’s The Drunkard’s Walk, he writes about the randomness of super-success. Not that the idea of good and bad is a crap shoot, but the fact that we can’t really predict why some books and movies do blockbuster business while others designed for that purpose go ignored. It is telling that one piece of research Mlodinow presented has to do with music and how it is ranked. Two cohorts of subjects were asked to rank songs. The difference between cohorts is that one cohort has no knowledge of how others ranked the songs, while the second did. The first cohort ranked songs as in distributed manner: the “likes” were spread over many songs. The second cohort had a “sharper” profile, where a few songs garnered high-rankings. Thus judging books by criticism or by sales might be a reflection of the herd mentality.
It is no secret that our opinions and evaluations can also hang on inconsequential details. The canonical stories come from orchestra auditions, where female performers are usually relegated to second-chair status – unless the auditions occurred with the performer behind a screen. Even among performers of relatively equal looks and talent (for whatever it’s worth, the researchers aimed to build the most homogeneous of sample sizes), the manner of dress and visual style could influence what evaluators think. If one listened to these performers without visual cues, he would be hard pressed to tell the difference (that was also an experiment in the study). It seems strange that we are all so concerned with “the best”, when even the most informed opinion remain just that, an opinion. I am not sure if it is meaningful to make the distinction between the levels of good a writer achieves, because this evaluation depends so much on how the critic is feeling at that particular moment.
One final example; in other posts in this blog, I have tried highlighting the research of Dave Berri, who has done a bit of work documenting how even recognized experts in a field may not be using the right metric or standard for evaluating talent or productivity. In sports, we have all the pertinent information to judge such matters. However, it is difficult to make the same assessment for the worth of books, of music, of movies, of food, of wine, and so on. There are technical aspects to discuss, sure, but after some level of proficiency, it becomes a matter of opinion whether one book is better than another.
To the sincere critics who wish to look for something new, I would add the following thoughts. Because I feel strongly that my verdict on a book (good or bad) is irrelevant, I take pains to write simply about my engagement with the story, themes, ideas, and characters in a book. I pitch what I write here as taking part in a discussion; I prefer to call these essays about books rather than analysis or criticism. I try not to place the books in the authors’ context but in my context (within constraints.) I understand fully that what I say here is not authoritative and is merely an opinion. The most I hope for that you find my opinions thoughtful and an interesting point of view.
The wrong metric
Although this blog is ostensibly about books, I’ve written a lot about sports, mostly dealing with how non-scientist readers perceive statistical analysis of athlete productivity. This issue fascinates me; I think how people think about sports statistics provides a microcosm in how they may respond to similar treatments in the scientific realm. Economists, mathematicians, engineers and physicists will provide a better explanation of the analysis than I can. Instead, I want to focus on the people who draw (shall we say) interesting conclusions about research.
In a recent podcast, Bill Simmons interviewed Buzz Bissinger on the BS Report (July 28, 2010). Bissinger gained some negative exposure as he had railed against the blogosphere and sports analysis. In this podcast, Bissinger was given some time to elaborate on his thoughts. He most certainly is not a raving lunatic, but he did say a few things that I find representative of how statistical analyses are often misinterpreted by non-scientists (and even scientists.)
Bissinger took the opportunity to trash Michael Lewis’s Moneyball, mostly by pointing out how Billy Beane isn’t so smart, and that all in the end, the statistical techniques didn’t work – only Kevin Youkilis – mentioned in the book, had proven to be a success. I think that misses the point. Yes, the book documents the tension between the scouts and the stat-heads. I think Lewis chose this approach to make the book more appealing, by taking the human interest angle, than simply writing a technical description of Beane’s “new” approach. Perhaps Lewis overstates the case in showing how entrenched baseball GMs were in relying on eyeball and qualitative skill assessments, but the point I got from the book was that: Beane worked under money constraints. He needed a competitive edge. Most baseball organizations relied on scouts. Beane thought that to be successful, he needed to do something different (but presumably had some relevance) to provide baseball success.
Beane could have used fortune tellers; I think the technique in Moneyball (i.e. statistical analysis) is besides the point. Beane found something that was different and based more of his decisions on this new evaluation method. This is a separate issue from how well the new techniques performed. the first issue is whether the new technique told him something different. As it happens (as documented in Moneyball, Bill James’s Baseball Abstracts, and by many sports writers and analysts), it did. The result is that Beane was able to leverage that difference – in this case, he valued some abilities that others did not – and signed those players to his roster. The assumption is that if his techniques couldn’t give him anything different from previous methods of evaluation, than he would have had nothing to exploit.
The second point is whether the techniques told him something that was correct. And again, the stats did provide him with a metric that has a high correlation with winning baseball games – the on-base percentage. So one thing he was able to exploit was the perception in value of batting average (BA) versus on-base percentage (OBP). He couldn’t sign power hitters: GMs – and fans – like home runs. He avoided signing hitters with high BA and instead signed those with high OBP.
This led to a third point: Beane can only leverage OBP to find cheap players (and still win) so long as there were few GMs doing the same. Of course the cost of OBP will increase if others come onboard and have deep pockets (like the Yankees and the Red Sox.) So Beane – and other GMs – would have to become more sophisticated in how they draft and sign players. Especially if they work under financial constraints. As my undergraduate advisor said, “You have to squeeze the data.”
One valid point point Bissinger made was that the success of the Oakland A’s coincided with the Big Three pitchers. So clearly, Bissinger wrote off a significant amount of Oakland success to the three. That’s fine, as the question can be settled by looking at data. What annoyed me is when readers do not pay attention to the argument. I just felt that Moneyball was more about how one can find success by examining what everyone else is doing, and then doing something different. The only constraint is whether something different would bring success.
I felt that Bissinger is projecting when he assumes that using stats means the rejection of visual experience. The importance of Moneyball is in demonstrating that one can find success by simply finding out what people have overlooked. Once the herd follows, it makes sense to seek out alternative measures, or, more likely, to find out what others are ignoring. If the current trend is on high OBP and ignoring pitchers with a high win-count, then a smart GM needs to exploit what is currently undervalued. Statistics happens to be one such tool – but it isn’t the only tool.
And part of the reason I write this is, again, to highlight the fact that people usually have unvoiced assumptions about the metrics they use. The frame of reference is important. In science, we explicitly create yardsticks for every experiment we perform. We assess things as whether they differ from control. It is a powerful concept. And even if the yardstick is simply another yardstick, we can still draw conclusions based on differences (or even similarities, if one derives the same answer by independent means.)
This brings me to recent Joe Posnanski and David Berri posts. The three posts I selected all demonstrate the internal yardsticks (hidden or otherwise) that people use when they make comparisons. I am a fan of these writers. I think Posnanski has provided a valuable service in bridging the gap between analysis and understanding, facts and knowledge. Whether one agrees or disagrees with his posts, I think Posnanski is extremely thoughtful and clear about his assumptions and conclusions, which facilicates discussion. The post has a simple point: Posnanski wrote about “seasons for the ages.” A number of readers immediately wrote to him, complaining about how just about anyone who hits 50 home runs in a season would qualify. To which Posnanski coined a new term (kind of like a sniglet) – obviopiphany.He realized that most people simply associate home runs with a fantastic season for a hitter. That isn’t what Posnanski meant, and in the post he offers some correction.
The Posnanski post has a simple theme and an interesting suggestion: the outrage over steroids may be due to the fact that people assume that home run hitters are good hitters. Since steroids help power, the assumption is that steroids make hitters good – which in most cases simply means more home runs. But Posnanski – and others sabermetricians – propose that one must hit home runs in the context of getting fewer strikeouts and more walks. The liability involved in striking out more, and not walking, is too much and washes out the gains made from hitting the ball far. Thus Posnanski posts names a 5 players who are not in the Hall of Fame, and aren’t home run hitters, but who nevertheless produced at the plate – according to some advanced hitting metrics. I won’t go into this more, except to say that here, Posnanski makes his assumptions clear. He uses OBP+, wins above replacement player, and other advanced metrics to make his point. But it is telling that Posnanski had to stitch together the assumptions his readers had – that the yardstick for good hitting simply boils down to home runs.
The Berri posts describe something similar. One of them is from a guest contributor, Ben Gulker, writing about how Rajon Rondo was not going to be selected for Team USA in the world championship because he doesn’t gather enough points. The other highlights how the perception of Bob McAdoo changed as a function of the fortunes of his team. Interestingly enough, McAdoo became a greater point getter while becoming a less efficient shooter and turning the ball over more; at the same time, his reputation was burnished by the championships his teams won.
The story has been told many times by Berri. It seems that in general, basketball writers and analysts associate good players as those who score points (in the literal sense, regardless of shooting percentage) and who played on championship teams. There are several problems here. Point getting must take place in the context of a high shooting percentage. One must not turn the ball over, one must rebound, one must not commit an above average number of fouls, and hopefully get a few steals and blocks. I don’t think anyone would disagree that such a player is a complete player and ought to be quite desirable, regardless of how many championship rings he has or if he scores only 12 points a game. Berri has examined this issue of yardsticks, and he has found that what sports writers, coaches, and GMs think of players has an extremely high correlation with, simply, how many points they get (this is shown by what the writers write and how they vote for player awards, how often coaches play someone, and how much GMs pay players.) The verbiage writing up about the defensive prowess and the “little things” are ignored when the awards are given and fat contracts handed out. Point getters get the most accolades and the most money.
And the other point is how easily point getters reflect the luster of championships. Nevermind that no player can win alone, but this again is an example of how people end up with not only unspoken yardsticks, but also choose a frame of reference without analyzing if it is the correct one. The reference point is a championship ring. As has been documented, championships are not good indicators of good teams. The regular season is. This is simply due to sample sizes. More games are played in the regular season. Teams are more likely to arrive at their “true” performance level than in a championship tourney with a variable number of games – and frankly where streaks matter. A good team might lose four games in a row, in the regular season, but they may lose only 10 for the year. In a tournament, they would be bounced out if they lose four in a series.
In this context, the Premier League system in soccer makes sense. The best teams compete in a regular season; the team with the best record is the champion. So people who assume that a point-getter who plays on a championship is better than a player who shoots efficiently (but with fewer points) and rebounds/steals/blocks/does not turnover above average, and on a non-champion team, make two errors. They selected the wrong metric twice over.
With that said, I could only have made that point because of newer metrics that provide another frame of reference. Moreover, the new metrics tend to have improved predictive abilities over simply looking at point-getting totals. Among the new metrics, there are some that show a higher correlation with the scoring difference (and thus win/loss record) of teams. It doesn’t matter what they are, but an important point is that one can derive these conclusions about which metric is better or worse.
This is the main difference in scientific (of which I include athlete productivity analysis) and lay discourse. In the former, the assumptions are made bare and frames discussion. A good scientific paper (and trust me, there are bad ones) makes excruciatingly detailed descriptions of controls, the points of comparisons, any algorithms/formulae, and how things are compared. In the lay discourse, this isn’t the standard one would use, because communicating scientific findings to other scientists use a stylized convention. Using such a mode of communication with friends would make one a bore and a pedant – not to mention one would become lonely real quick.
Statistical certainty
I read Bill Simmons’s The Book of Basketball. I enjoyed his book, as it is a fun survey of NBA history. The book isn’t just a numbers game or just breaking down plays. It includes enough human interest elements that it should appeal to a casual fan or diffident parties (like me; I can count the number of basketball games I’ve seen – TV or live – on both hands.) Simmons does a fantastic job of conveying his love of basketball. For me, he really brought different basketball eras to life, inserting comments from players, coaches, and sportswriters. He also seems fairly astute in breaking down plays and describing the flow of the game.
Yes, I bought the book because I think Bill Simmons’s writing. If you enjoy his blog, you will find that same breezy conversation style here. The man has a gift for dropping pop culture references and making it germane to his arguments. But what I like most is that he is earnest in trying to understand and to make his readers appreciate the people who play a game for a living.
His segment on Elgin Baylor was moving, in showing how racism affected this one man; in some ways, it was probably more effective than if he just talked in general terms about the 1960’s. His whole book works because it stays at the personal level. Even in his discussion of teams and individual players, he takes pains to discuss how this person was and is regarded by his peers and teammates.
In this way, I think Simmons did a fantastic job of making a case that basketball can contain as much historical perspective as baseball. This is something that should not have to be argued. Baseball has a lock on “the generational game by which history can be measured” status. What seems important is that there are human elements that make it accessible between generations: things like fathers taking their sons to the games, talking about the games and players, the excitement of watching breathtaking physical acts that expand how one views the human condition, and the joy and agony of championship wins and losses. While baseball’s slow pace lends itself to the way history moves one (periods where nothing seems to happen punctuated by drama), it doesn’t mean other things happen in a vacuum. Style of play, the way the players are treated, and the composition of the player demographic all reflect the times. These games can be a reflection of society, and one can see the influence of racial injustice in something as mundane as box scores as integration occurred.
Simmons blend basketball performance, its history, and its social environment of basketball effectively, some examples could be found in his discussion of Dr. J, Russell, Baylor, Kareem, and Jordan. In discussing why there probably won’t be another Michael Jordan (or Hakeem, or Kevin McHale), he takes inventive routes. Most of his points relate to societal/basketball environment pressures. Players are drafted sooner, the high pay scale for draft picks lower motivation to prove their worth, and perhaps society itself would actively discourage players from behaving as competitively as Jordan did. I suppose it’s interesting, but I’m not sure if that matters so much if the player is perceived to be an excellent player. Regardless, it seems to me that Simmons has been thinking about these things for some time. And I found it fun to read his take on basketball.
And I liked this book because it gives the lie to the weird view that someone who hasn’t done something cannot make reasonable, intelligent statements about it. Simmons wasn’t a professional basketball player, but he certainly uses every resource available to absorb the history and characters populating the game. He read a fair bit, he watched and rewatched games, he talked to players, he talked to people who covered basketball and he watched some more. And he isn’t afraid to raise issues that occur to readers; you’ll see what I mean when you read his footnotes.
The book (and his podcast) confirms my opinion of Simmons as the smart friend who’d be a blast to have (one who bleeds Celtics green, watches sports for a living, and must keep up with Hollywood gossip, gambles, and pop culture because it gives him ammunition for columns).
***
There are some issues with the book, mainly in how statistical analysis of basketball is portrayed. I should be upfront and say that these issues did not detract from his arguments (for reasons that will be clear later), but I wish he would reconcile eyeball and statistical information. And because I’ve decided one focus of this blog should be how non-scientists deal with science (and scientists), I thought I should offer some thoughts on some of these issues.
I am somewhat undecided about how Simmons (and I suppose I am using him as a proxy for all “non-scientist”) actually feels about statistics. He claims that team sports like basketball and football are fundamentally different from baseball; the team component of the former increase the number of additive and subtractive interactions while the latter game is composed of individual units of performance. Thus the increase in complexity makes it difficult to model. So he discards so called simple measures of NBA player performance like WP48, PER, and adjusted plus-minus.
His rationale is that these indicators ought to back up existing observations about NBA players. So Kobe Bryant needs to be ranked as a top-20 player of all time (WP48 ranks Bryant as a superior player – like Paul Pierce – and not a step or two behind Michael Jordan.) It seems like he wants statistics to tell him what he wants to hear, when in fact statistics helps you see things you don’t see.
But then that leads to my second point about Simmons: why does he need the model to back up his mental model of player performance? Put differently, why is it that he cannot accept differences in rankings calculated by some turn-the-crank-spit-out-value model? I think Simmons lacks a nuanced view of how these numbers ought to be interpreted, and that he refuses to see that a simple model can capture a great many things about a complex system. Sure, once you’ve set up your criteria (like some level of significance you are willing to accept), you align everything by it, but there is room for some judgement as to where that line is drawn.
Another way of describing a complex system is to say that there are many things going on at once, and they are all interacting in some way. There are 10 players on a basketball court. One player, with the ball, has options to pass, to shoot, or to move the ball. Within each of these options, he has a set of suboptions: which one of the other four guys do I pass to? Who’s open? Which open player has a good shot from where he is? Am I in my optimal position to shoot? Do I need to drive to the basket or kick the ball out to the perimenter? There are many more possibilities than these.
***
At one level, Simmons is right; it is useful to break things down into “hyperintelligent” stats – identifying the tendency of players (whether he likes breaking to his left or right when he’s starts driving from the top of the key, whether he is equally good in shooting from his left or right hand, how often he does a turnaround, fadeaway, or drives to the hoop), trying to figure out how many forced errors a defender creates, how often a unforced turnovers happen (like someone dribbling off his foot), how many blocks get slapped out of bounds vs being tipped to get possession, and so on.
But isn’t it just as intelligent to find an easy way of collapsing the complex game into a simple “x + y” formula? On several occasions, Simmons uses a short quote (and praises the person who said it) that captures everything he wanted to say in 15 pages. A simple model is analogous to that short quote.
More importantly, what if we didn’t need all these hyperintelligent stats to capture the essence of the game?
I just switched the problem from one of identifying player performance and productivity to one that captures the game a broad strokes. The two ideas are of course related but still distinct and should not be confused to mean the same thing.
This gets back to the original motives of the person who does the modeling.
If it’s a scientist or economist, I’ll tell you now that he is interested in getting the most impact with the least amount of work. He probably has to teach, run a lab/research program, and write grants and publications. He doesn’t have time to break game film down. And he certainly does not have the money to hire someone to look at game film (although I am sure he’ll have no lack of applicants for the job.) He spends his money finding people to do research and teach. If his research program is into finding ways to measure worker productivity, he will probably start with existing resources. So fine; he now has a database of NBA player box scores.
He’ll want to link these simple measures of player output to wins and losses. But players score points, not wins, and thankfully the difference in points scored and points given up correlate extremely well with wins and losses.
From there, it is relatively simple to do a linear regression for all players for all teams, finding how each of the box score stats relate to the overall points scored for each team. And as noted, some metrics have a higher correlation to the point difference (I will not use the term differential to mean difference; differential belongs to diff EQ’s.) Regardless, it seems an affliction for males that they rank things; so the researchers have these numbers, and it’s trivial to list players from high to low.
Now, here’s another consideration. In this, and in other branches of science, the data are not “clean”. That is, we scientists (generally) assume that the phenomenon we are observing conforms to a “normal” distribution – that is, there is some true state for the thing we observe (found by taking the average of our observations) and the individual pieces of observation hover around this true state (or average). So there is variation around the mean.
In my research, for example, I can measure neural responses in the olfactory bulb. I use optical indicators of neural activity; essentially, the olfactory bulb lights up with odor stimulation. The more the neurons respond, the brighter things get. The olfactory bulb is separated into these circular structures called glomeruli. Each glomerulus receives connections from the sensory neurons situated in the nose and the output neurons of the olfactory bulb (some other cells are also present, but they aren’t important for this story.)
When a smell is detected by humans (or animals and insects), what we mean is that some chemical from the odor source has been carried, through the air, into the nose and neurons become active (they fire “action potential spikes”). And the pattern of this activity, at the olfactory bulb, is quite similar – but not exactly the same – from animal to animal.
Sometimes, we see fewer responses to the same smell. Other times, we see a few more responses. Sometimes we see a different pattern from what we expect. Sometimes, we see no responses. This might happen once every 15 animals. Not a whole lot to take away from our general, broad stroke understanding of how this part of the brain processes smell information. In most cases, some of these things might be explained technically; the animal was in poor health, or our stimulus apparatus has a leak, or the smell compound is degraded. We know this because we can improve the signal by fixing the equipment or giving the animal a drug to clear up its nose (mucus secretion – snot! – is a problem).
And as a direct analogy to this WP48 vs “hyperintelligent stats” problem, we find that a complex smell (compose of hundreds of different chemicals) may be “recreated” by using a few of these chemicals. There is good empirical evidence this is the case: prepared food manufacturers and fragrance makers can mimick smells and flavor reasonably well. This is akin to capturing the essence of the smell (or sport) with a few simple chemicals (or box scores). And generally, we don’t even need people to describe to us what they smell to figure this out (i.e. break down game film to create detailed stats). We can simply force them to make them answer a simple question: do these two things smell the same to you, yes or no? Thus “complex” brain processes and decision making can be boiled down into a forced-choice test results. Do we lose information? Yes, but everyone realizes this is a start. As we know more, and new technology becomes available, we can do more and ask more with less effort. Then we will be able to better use the information we have. As far as I know, most statheads have access to box-scores (although there is nothing to stop them from breaking down game film aside from time and money issues.)
But that’s the broad strokes view. If we get into details (that is, as if we started working with the “hyperintelligent” stat breakdowns), we find that of course there is more going on, and that the differences we see are not only technical issues. For example, the pattern of activity we see differs slightly from animal to animal, but this is because the cells that form connections with the olfactory bulb do not hit the same spot. And if we can use a single chemical to recreate a smell, the smell itself is still different enough that humans generally can tell something is missing. So the other chemicals are in fact detected and contributing some information that the brain uses to form the sensation of smell. And we know that the way neurons respond to a single chemical differs from how they respond to a mixture, confirming that there is in fact additional information being transmitted.
The important point is that the simple model captures an important part, but not all, of the complex system. One problem that can occur with increasing the complexity of models is that overfitting occurs: the model becomes applicable to one small part, rather than the whole, system. Even game film breakdown hinders if it gives you so many options that you are back where you started. You’d probably avoid focusing on rare events and just concentrate on the things that happen often – which, again, is the point of a simple model.
The intense break down of game film to provide detailed portraits of player effectiveness could be combined with the broad strokes analysis. A metric like WP48 can tell a coach where a player is deficient. The coach can use the detailed breakdown to figure out why the player isn’t rebounding, passing, shooting well, and so on. That’s where things like defensive pressure, help defense, and positional analysis can be used for further evaluation. And I’m not sure if stat heads argued otherwise.
Deficiencies of statistical models
As in the things that models explicitly ignores.
One thing statistical models do not address is the fan’s enjoyment of a player. Actually, I suppose one might be able simply chart percent-capacity of stadiums when a particular player comes to town, but that’s something I don’t think Simmons would argue. There’s something to be said about how a player scores: Simmons pays tribute to Russell and Baylor, the first players to make basketball a vertical game. He cites Dr. J. as introducing the urban playground style into basketball. He loves talking about the egos of players, especially when players take MVP snubs personally and then dominates the so-called MVP in a subsequent game.
Simmons also offers a rebuttal to PER, adjusted plus/minus, and “wages of win” metrics in his ranking of Allen Iverson – by saying that he doesn’t care. It’s sufficient for him that he finds Iverson a presence on the court. His emotions are acted out as basketball plays. He finds Iverson’s toughness and anger on the court fascinating to watch.
But Simmons does use metrics: the standard box scores. I would ask this: if Iverson didn’t score as much as he did, would Simmons still care? As Berri has noted, the rankings by sportswriters, the salaries given to scorers, and PER rankings all correlate highly with volume scoring (i.e. the points total, not field-goal percentage). Despite the tortured arguments writers might make, and the lip service given to building a lineup with complete players, “good” players are players who score a lot.
However, I should be clear and say that Simmons’s approach does not detract from his defense of his rankings. He uses player and coach testimonies, historical relevance, visual appeal of their playing style, sports writers, and the box scores to generate a living portrait of these players as people. Outside of the box scores, there are enough grist for the mill. I would suggest that it is these arguments that make the whole argument process fun. Even in baseball, supposedly the sport with the most statistically validated models of player performance (and Berri would argue that basketball players and their contribution to team records are even more consistent), there are enough differences of opinion concerning impact, playing styles, and relvance to confound Hall of Fame/MVP arguments (see Joe Posnanski).
Because Simmons is upfront about his criteria (even if the judgement of each might be not as “objective” as a number), it is fine for him to weight non-statistical arguments for greatness. It’s how he defined the game. Just as Berri defined “player productivity” in terms of his WP48 metric. Because Berri publishes in peer-reviewed journal, he needs methods that are reproducible. Science, and in general the peer review process, is a different process than writing books or Hall-of-Fame arguments or historical rankings. The implicit understanding of peer-review is that the work is technically sound and reproducible. Berri cannot take the chance of publishing a Simmons-like set of criteria and have other sports economist “turn the crank” and come out with different rankings. But Berri can publish an algorithm, and proper implementation will yield the same results.
Does this mean that Berri is right? Or that a formula is better than Simmons’s criteria? Mostly no. The one time where it is “better” is when one is preparing the analysis for peer-review. In this case, it is nicer to have a formula, or a process, or a set of instructions, that yield the same result each and everytime the experiment is run. In other words, we try to remove our bias as much as possible. Bias here does not mean anything pernicious; it just is a catch-all term for how we think a certain way (with our own gut feelings about the validity of ideas and research direction). Being objective simply means we try to make sure that our interpretation conforms to the data, and that the work is good enough so that other researchers come to the same general conclusions.
I think Simmons actually doesn’t need to trash statistics, nor does he need to ignore it. Once he establishes ground rules, he can emphasize or deemphasize how important box scores are in his evaluation. As it is, I found his arguments compelling. His strength, again, is to make basketball history an organic thing. He does his best to eliminate the “you had to be there” barrier and tries to place the players in the context of their time.
Now, one might ask why stats can’t be used to resolve these arguments about all time greats. Leaving aside the issue of the different eras (and frankly, this can be addressed by normalizing performance scores to the standard deviation for a given time period, as Berri does here ), there is the issue of what the differences in these metrics mean. In the same article I cited, Berri reports that the standard deviation for the performance of all power forwards, defined by his WP48 metric, is about .110. His average basketball player has a WP48 of .100. Kevin Garnett, for example, has a WP48 (2002-2003) of 0.443. That translates roughly that Garnett is more than 4x as productive as an average player, but normalized to the standard deviation, he is only 3.5x as productive.
But how much different is a power forward from Kevin Garnett if the other forward has a WP48 of 0.343? One might interpret this to mean that Garnett is still nearly 1 standard deviation better than the other player, but it could also mean that their performance fall within 1 standard deviation of each other. Depending on the variation of each player’s performance for a given year, compared to his career mean, they could be statistically similar. That is, the difference might be accounted for by the “noise” in slight upticks/downticks in rebounds/assists/steals/turnovers/shooting percentages/blocks. If you prefer, how about the difference between a .300 hitter and a .330 hitter? Over 500 at-bats, the .300 has 150 hits, and the .330 hitter has 165; the difference would be 15 hits over the course of a season. Are the two hitters really that different? The answer would depend on the variability of batting average (for the compared players) and how these numbers look with a larger sample set (i.e. over a career with over 5000 at-bats, for instance.) The context for the difference must be analyzed.
Here’s another example: let’s assume that Simmons and Berri’s metric turned out similar listings, perhaps with different order (one difference is that Iverson would be nowhere near Berri’s top 96.) And further, let us assume that the career WP48 scores are essentially within 1.5 standard deviations of one another. How might Simmons break with the WP48 rankings?
Let us tackle how Berri would have constructed his ranking: he would simply list players from highest to lowest WP48. That’s probably because he is in peer-review article mode. And frankly, if you profess to have a metric, why would you throw it out? You might if, like Simmons, you defined the argument differently. Of his Pyramid of Fame rankings, he lists a few arguments that do not encompass basketball productivity. Again, the idea of historical relevance, player/coach testimony, and the style and flair of the players enter into Simmons’s arguments. So all things being equal, and if the difference in rankings by metric is slight, there really is no reason against weighing the statistics more than any other attribute. Heck, even if the metric differences are large, it wouldn’t matter. Simmons like his other arguments more anyway.
But if you do talk about the actions on the court, then I believe you are in fact constrained. Of the metrics I had mentioned, WP48 offers high correlation with point-difference and thus with win-loss records. Further, some of the other metrics actually correlate with points-scored by players, suggesting that there is no difference between that metric and simply looking at the aggregate point total. So there are actually models that do reasonably well in predicting and “explaining” the mechanics of how teams win and lose.
In a way, I think the power of a proper metric is not in ranking similarly “productive” players, but in identifying the surprisingly bad or good players. Iverson is an example of the former; Josh Smith (of the 2009-2010 Hawks) of the latter. It might not be as powerful a separator of players with similar scores, because their means essentially fall within 1 standard deviation of one another; in essense, they are statistically the same. In this case, it helps to have other information to aid evaluation (and this isn’t easy; as Malcolm Gladwell has written, and Steven Pinker taken issue with, some measuring sticks are less reliable than others.)
Another example where statistics is powerful is in determining, in the aggregate, if player performance varies from year to year. Berri found that it isn’t, suggesting that the impact of coaching and teammate changes may not be as high as one thinks. However, such a finding in no way precludes coaches and teammates from having an effect on teammates. It just means that these people are too few to affect the mean. Or perhaps it suggests that coachs are not using information properly to make adjustments that are meaningful to player performance. Overall, I suppose, one cause for why Simmons hates advanced stats and rankings is that he isn’t sensitive to the importance of standard deviation, and ironically enough, he applies the mean tyrannically when there is such a concept as statistical insignificance.
But Berri has never pushed his work as a full explanation of the game of basketball. First, he doesn’t present in-game summaries: he only looks at averages over time. There’s nothing in his stat to indicate the ups and downs (i.e. standard deviation in performance) a player experiences from game to game. Even in baseball, hitting .333 does not guarantee a hit every 3 at-bats. It just means that over time, a hitter’s hit streaks and lulls add up to some number that is a third of his at-bats. Berri’s metric (and any other work that proposes to measure player performance) certainly cannot predict what a given box score would be, for a given game, for a given player.
Regardless, I do not see a problem with Simmons’s ranking his players. Simply, he values entertainment value as much as production. I would say he values the swings in performance just as much, if not more (more on this later). Yes, he says stats do not matter, but of course it does. It’s interesting that all the scoring lines he cites, in admiration, all lead with a high score or score per game. And if you can’t shoot, rebound, pass, steal, or block and coughs the ball up a lot, it wouldn’t matter how pretty you make everything look.
No-no’s
Joe Posnanski has pointed out that, whenever someone trashes stats, he tends to offer some other supplemental numbers that back up his point. In other words, the disagreement isn’t about statistics per se, but between the distinction of “obvious” stats vs. “convoluted” stats.
Even if one disagrees with basketball statistics, at least he can believe that statheads came up with a formula first and turned the crank before comparing the readout with their perceptions of players. Hence Simmons blowing up when PER or WP48 doesn’t rank his favorites highly.
Simmons approaches this from the opposite direction. He has an outcome in mind and “builds” a stat/model to fit it (like his 42-Club). But he mistakes his way of tinkering with what modelers actually do. Berri arrived at his model by performing linear regression on a particular box score and seeing whether the point-difference increased. It isn’t an arbitrary way of deriving some easy to use formulation. The regression coefficients are meaningful in that, what it says is, if you increase shooting percentage by this amount, the point-difference goes up by that amount. It so happens that points scored by a player did not increase the point-difference. And he built it by using all players; it’s strange to decide before hand what players are great, and then build a metric around that. Why even bother in the first place?
And for Berri to report differently on these aggregate data because Kobe isn’t ranked any higher, actually would become scientific fraud. But as I noted above, applying these WP48 rankings isn’t as hard and firm a process as Simmons thinks. There is some room for flexibility, depending on what one tries to accomplish.
In general, I agree that more break downs in the game would be useful, in the sense that more data is always nice. The problem, for academics, is that these stats might remain proprietary, and it becomes difficult to apply across all teams. Even if we could get all the “hyperintelligent” stat breakdowns from a single team, it is unclear if other teams would view the break down in the same way. The utility for examining general questions about worker (i.e. player) productivity for academic publication becomes less clear. The database ought to help the teams – assuming they are intellectually honest enough to verify that their stats that produce a better picture of player productivity and aren’t impressed by the gee-whiz-ness of it all. My guess is that they won’t be entirely successful, as Simmons still has a job trashing bad GM decisions.
Standard Deviations
Why I watch sports: it seems to be similar to the way Simmons does. He watches over a thousand hours of sports each year, waiting for the chance to see something he has never seen before. Something that stretches the imagination and the realm of human physical achievement.
I feel the same way; I am team and sport agnostic, and although I used to follow Boston Bruins hockey religiously, I left that behind in high school. Although I have lived in Boston from the age of 7 onwards, I had not been infected by the Red Sox or Celtics bug (even during their mid-80’s run). I did root for the Red Sox in 2003 and 2004, but that was because of the immense drama involved in the playoff games against the Yankees. And Bill Simmons’s blog for the season.
Perhaps I prove Simmons’s point about stat heads; I like to say that I am interested in sports in the abstract. I like the statistical analysis for the same reason Dave Berri had pointed out in his books. There is a wealth of data in there to be mined. I thought one good example of the type of research that can come from these data is finding evidence for racial bias in the way basketball referees call games.
However, what got me interested in watching professional sports was Simmons writing about it. Although I didn’t watch football, basketball, or baseball for a long time, I did watch the Olympics and, believe it or not, televised marathons. Partly it was because my wife and I were running, but mostly I saw the track and field type sports as a wonderful spectacle. So it wasn’t that much of a stretch to fall into a stereotypical male activity.
At any rate, I was amazed at Usain Bolt’s performance in the 2008 Summer Olympics. I was disappointed by Paula Radcliffe injuring herself during the Athens Olympics, and then relieved when she won the NYC marathon, setting a new speed record in the process. I rooted for Lance Armstrong to win his seventh Tour. I rooted for the Patriots to get their perfect season. And until the Colts laid down and the Saints loss a couple of weeks ago, I wanted the Colts and the Saints to meet in the Super Bowl, both sporting 18-0 records. I was glad that the Yankees won the World Series, and with that fantasy baseball lineup, I hope they continue to win. I want to see the best teams win, and win often. And yes, I wish the regular season records lined up with the championship winners for a given season. Then we wouldn’t have arguments about best regular season records and the championship winners.
This isn’t because I’m a bandwagon fan; I watch sports now for the same reason that Simmons does. To see the best of the best do great things. But not always because they might have a competitor who wants it more, leading to the best failing, at times. This drama is the power of sports.
And I can see why Simmons argues so passionately against stats. He likes the visceral impact of sports. I can say that Bolt ran a 9.69s 100 m. But it was nothing compared to seeing Bolt accelerate, distance himself from the other runners, and then slow down as he pulled into the finish line. He blew away the competition. My eyes were wide and my mouth hung open: he slowed down! And he was 2 strides ahead of everybody. And he set a new record. Even if Bolt didn’t set the record, he still made it look easy. On the field, on that particular day, he out-classed his competitors. It is watching the struggle of the competitors (like Phelps winning the 100m fly by 10 milliseconds), on that day, that matters. Over time, if one didn’t watch that particular heat, then the line World Record: Usain Bolt, 100 m, 9.69s doesn’t quite hit you the same way.
But then, there is this. What if instead of looking at the single race, you looked at the athlete performing in 8 or 20 or 50 events for a year? And at these events, the same set of athletes compete over and over?
Here are some possible outcomes: Phelps and Bolt lose every other match, essentially giving us a single transcendental moment. Phelps and Bolt win half their meets. Phelps and Bolt utterly dominate the field, winning 65% or more of their meets.
For first case, we would probably admit that the Phelps and Bolt phenomena was a one-off. For whatever reason, the contingencies (no sports gods or stars aligning here!) lined up such that they did highly improbable feats (but not impossible. This distinction is the point of this section.) The third case proves our point; they are not perfect, but they sure are good. The second case is a bit trickier: since they are right on the borderline, we need some analysis to help us decide. One way might be to sum up our individual observations about these two. Being .500, while giving us a single breathtaking moment might be persuasive. Or one might look at how everybody else did (Phelps and Bolt might have won 50% of the time, but if the remainder is split among their competitors, they have still dominated the field.)
But then what if Bolt and Phelps won 49% of the time, and some other competitor won 50% of the time? What then? Here, criteria are important. Most of the time, we say better meaning, well, something is better. Generally, we aren’t specific about what we mean by it.
In the book, Simmons ranks his top 96 players in a pyramid schematic. He is rather specific about what he wants in a player. And as one expects, he is specific about the types of intangibles his basketball player should have (basically, basketball sense – i.e. The Secret, if he made his teammates better, winnability, and if you choose someone based on “if your life depended on this one guy winning you a title.”) The evaluation of those intangibles, however, is not as precise as he’d like. However, the advantage here is that one might be able to answer “why” questions. In some cases, Simmons seemingly ranked two players differently while giving them the same arguments (like the consistency of Tim Duncan and John Stockton. Somehow, Stockton just rubbed Simmons the wrong way, while Duncan’s consistency makes him the seventh best player of all time.) And his emphasis on projecting Bill Russell’s game into the modern era seemed like Russell should have ranked lower. On occasion, I was left with the feeling that the arguments did not match the ranking. From what he said about the stat inflation and how Wilt didn’t get the secret, I thought he would be ranked lower than 6.
Dave Berri has the opposite problem: he has a mathematically defined metric and when he says better or worse, it’s whether this metric is higher or lower between the players being compared. He can further break down this stat to show where a player is good or deficient (whether shooting percentage, blocks, turnovers, fouls, steals, and assists are above or below the average). He can tell you the hows, with his model spitting out a number that combines these different performance stat into a metric of productivity. But he simply ranks players numerically, without talking about how these differences one might see between the players (and one might not be able to see it… it could be one more missed shot or one less rebound every couple of games.)
I am amazed that Simmons cannot reconcile eyeball and statistical information. Just about every time Simmons bitches out scorers, he talks about how this player didn’t get “The Secret”. It isn’t about scoring; it’s about having a complete game. It is about making the team better with the skills you have. To top it off, Simmons then says that point getters are one dimensional. You can’t shy away from rebounds. It’s great to have a few steals/blocks. Sure, not every athlete can do it all, and certainly not be as prolific as superstars, but you can’t avoid doing those things.
I’m sure Berri is nodding his head, agreeing with Simmons. Point getting isn’t the same as being a efficient shooter (at least average field goal and free throw percentages). And you certainly can’t be below average in the other areas if you want to help your team.
But Berri generally writes about the average. Simmons focuses on the standard deviations. He doesn’t just care about the scoring line; he focuses on Achilles-wreaking-havoc-on-the-Trojans type of performances. He loves the stories of Jordan’s pathological competitiveness. In other words, Simmons lives for the outlier moments.
And I think therein lies the nutshell (and to borrow a Simmons device, I could have said this 5500 words ago and shortened this review.) Simmons views the out-of-normal performance as transcendent, as examples of players who wanted something more or had something to prove. He treats the extreme as something significant; he uses a back story to it to give the event meaning. That’s fine. It’s also fine when Berri (and stat heads) are constrained in treating outliers as noise (possibly) or irrelevant to the general scope of the model, if they desire a model of what usually happens and are not concerned with doing the job of a GM and a coach for free. Because they both defined the game they wish to play in.
When to talk…
I swear I never meant for this blog to focus so much on sports. But Dave Berri has a post that dovetails neatly with some thoughts I have regarding experts, expertise, and how the public should handle them. I think it can be interesting to approach science issues from the side, rather than head on. Specifically, three authors (Berri, Malcolm Gladwell, and Steven Pinker), all of whom I admire, have had a minor verbal tussle about the issue of expertise.
First, a digression. I was already going to comment on the interface between experts and laymen. The original impulse came about because I just finished reading Trust Us, We’re Experts! by Sheldon Rampton and John Stauber. Like books of this ilk, the authors spend many chapters recounting the failures of authority figures and the exploitation of these failings by people who follow the profit motive to an extreme degree. Although the title hints at a broadside against arrogance of scientists, it really is about the appropriation of the authority, rigor, and analysis of science to sell things. The targets of this book are mainly PR companies and the corporations that hire them. There are also a few choice words for scientists who become corporate flacks.
The book lacked in presentation, mostly because the authors avoided analyzing how one can tell good from bad science. The presentation leans on linkages between instances of corporate malfeasance; there is no analysis and data on how many companies engage PR firms in this. There is no analysis on the amount of research from company scientists versus independent ones. The authors focus on motives of corporate employees, but somehow ignore the possibility of bias within the academy. There is no attempt to identify if and when corporate research can be solid. In broad brush strokes, then, chemists who discover compounds with therapeutic potential are suspect; the same people working in academia (and presumably someone who will not capitalize on this finding financially) can be trusted.
This is actually a huge problem in the book; one of the techniques that Rampton and Stauber document is the use of name-calling (good old fashion “going negative”, ironically enough, the PR firms would simply label all opposition as junk science.) in describing research and scientists who publish contrary findings from whatever corporations happen to be pushing. But by avoiding the main issue of identifying good and bad science, the two stitch examples of corporate and public relations collusion. Now, the evidence they present is good; they hoist PR and corporate employees by their own petards, quoting from interviews, articles written for PR workers, and from internal memos. But the ultimate point here is that Rampton and Stauber simply tarnish corporate research because the scientists work for corporations. I believe this to be a weak argument and is ultimately useless. One example I can think of is, what if two groups with different ideologies present contrary findings? Assuming that the so called ‘profit motive’ are equally applicable, or not at all, then readers will have lost the major tool that Rampton and Stauber pushed on in this book. But as I will show, the situation is not always as stark as, for example, corporate shills and academicians or creationists against biologists. There is enough research of varied quality, published by ‘honest actors’, to cause enough head-scratching about how solid a scientific finding was.
Let’s be clear, though. Of course the follow-the-money strategy is straightforward and, I would think more likely than not, correct. But that cannot be the only analysis one does; if the thesis is that PR firms use name-calling as a major tactic in discrediting good, rational, scientific research, it seems bad form to use funding source as a way to argue that investigators funded by corporations do bad research. It’s just another instance of name calling. I expected more analysis so that we could move away from that.
And that’s the unfortunate thing about a book like this; why wouldn’t I want a book that causes outrage? Why, in essence, am I asking for an intellectually “pure” book, one that deals with corporate strong arm tactics in a so-called more methodical, scientific way. Doesn’t this smack of the political posturing, where somehow a result matters less than the means – and no, I do not mean the ends justify the means. I am just pointing out that there might be multiple ways of doing something (like taking route A vs. B or cutting costs by choosing between vendor C and vendor D). Workplace politics might elevate these mundane differences into managerial warfare. Why should I care what the politics are, so long as it leads to a desirable end result?
One problem problem with a book like Trust Us is that it appeals to emotions with rhetoric, without a corresponding appeal to logic. I think including analytical rigor is important as it provides the tools for lasting impact. As it is written, the book (published in 2000) provides catchy examples of corporate malfeasance. The most basic motif is as follows: activists use studies that, for example, correlate lung cancer with smoking in order to drive legislation to decrease smoking. Corporations and interested parties attack by calling this bad science, by calling the researchers irresponsible, by calling the activists socialist control freaks who wish to moralize on an issue that is really a matter of personal choice. They have a considerable war chest for this sort of thing. Frankly, if that’s what Rampton and Stauber are worried about, then their focus should have been on the herd mentality of people, not the fact that PR firms use negative ads.
But that is only one weapon; the other weapon is the recruitment or outright purchase of favorable scientific articles. The example would be the studies published by scientists who work for tobacco companies, with the studies refuting the claims of the investigators. But Rampton and Stauber focus on simply point out that this favorable finding comes from researchers who are paid by Philip Morris. That’s nice, but how is this different from the name-calling Philip Morris engages in? The real issue is how one goes about identifying what bad research is.
They do throw a sop to analytical tools, at the end of the book. The discussion is cursory; the focus is again on helping the reader dissociate the emotional rhetoric from the arguments (such as they are.) The appeal is that the analysis is simple. Just question the motives of the spokesmen and experts.Worst of all, their discussion of the difficulties of science gives the impression that the whole enterprise is a bit of a crapshoot anyway. They point out peer review is a recent phenomenon, that grant disbursal depends upon critiques from competing scientists, and that the statistically significant differences reported are more often than not, mundane and not dramatic. Their discussion of p-values make scientific conclusions sound like so much guesswork, rather then the end result of hard work. Day-to-day science isn’t as bad as the pair portrayed it.
It is a trick to take a broad question (“How does the brain work?”), break it down into a model (“Let us use the olfactory system as a ‘brain-network lite'”), identify a technique that can answer a specific question (“I wonder if the intensity of a smell is related to the amount of neural activity in the olfactory system? We expect to see more synaptic transmission from the primary neurons that detect ‘smells.'”), do different experiments to get at this single question, analyze the data, and write up the results.
Forget the fact that different scientists have different abilities to ask and answer scientific questions; nature doesn’t often give a clear answer. So yes, it is hard to get conclusive statements. To confound the issue further, even good research can have a flaws, unclear experimental design, incorrect analysis, and distressingly minor differences between control and test conditions. Which leads us to the question, what exactly does good research look like?
I am not going to answer this now, and I can’t answer this. The blog will, eventually, attempt to deal with this very issue by presenting papers and research that I read about, in addition to book reviews. But my point here is that Rampton and Stauber didn’t address this issue either. The very end of the book is a populist appeal, one that emphasizes “common sense” over jargon and statistics. They even appeal to our civic duty, that we should become more politically active and associate with (my term, not theirs) “lay-experts”. At some point, however, even well-informed non-scientist and non-experts must have turned to experts for some original research. Rather than disregard that research, then, one must learn and gain a comfort level with parsing scientific literature.
It took a while, but we return to the Gladwell-Pinker-Berri flap. The setup is simple: Berri is a sports economist, specializing in creating models that predict athletic performance. However, he has tackled multi-player games (basketball and American football), which, presumably, would lead to complex models, or perhaps something computationally intractable. Surprisingly, he found that neither was the case. The important point this time is that he was able to show where quarterbacks are selected in the NFL draft doesn’t fit with their performance (assessed using the Berri and Simmons QB Score metric.) Gladwell wrote an essay that presented Berri and Simmons argument favorably. Pinker made a short comment refuting this, saying that QB’s drafted high do have better performance.
Both Pinker and Gladwell‘s review and response seemed snippy to me. But what I found interesting was that while Pinker questioned Gladwell’s ability as an analyst (while giving Gladwell the backhanded compliment that he is a rather gifted essayist – but not a researcher or analyst), Gladwell, in turn, questioned the background of Pinker’s sources. I think Gladwell’s highlighting the faults with the arguments was sufficient, as Pinker’s sources are somewhat weak. It really wasn’t necessary to impugn their background.
This is ironic, as Pinker raises some peripheral issues regarding Gladwell’s suitability in reviewing the research and observations from experts. Just as with Gladwell, I think Pinker gave a reasonable counter-argument to Gladwell’s generally gung-ho and favorable presentation of his subjects. For example, there is a flip side to imperfect predictors: while they may not be useful for predicting the most suitable candidates, they help to remove the worst ones from the pool, in a cost-effective way. That’s an interesting, and I think one “system” that scientists can study to answer this is… sports (because of the wealth of performance data).
There really is no need to trash an expositor just because he is a better essayist than a scientist, for instance. Isn’t Gladwell in fact an expert in conveying novel research to the public (and effectively)?
In this case, I think both the “expert” and “lay person” gave a good accounting of their (intellectual) problems with the other. However, they both engaged in what amounted to look-at-the-source “analysis” (Pinker says Gladwell doesn’t know what he writes about. Gladwell trashes Pinker’s football sources for things they did, that are unrelated to football). The only thing the ad hominem attacks achieved was to raise the blood pressure of both participants.